`
runfeel
  • 浏览: 907378 次
文章分类
社区版块
存档分类
最新评论

Spring Batch Framework– introduction chapter(上)

 
阅读更多

Bacth processes are hard to write-especially when using ageneral language like Java. Batch jobs run every night, making it easy formillions of people to do things like banking, online shopping, querying billinginformation.

Spring Batch is Java Framework that makes it easy to writebatch applications. Batch applications invlove reliably and efficientlyprocessing large volumes of data to and from various data sources (files,databases, and so on). Spring Batch is great at doing this and provides thenecessary foundation to meet the stringent requirements of batch appliocations.Sir Isaac Newton said, “If I have seen further it is only by standing on theshoulders of giants.” Spring batch builds on the shoulders of one giant inpraticular: the Spring Framework. Spring is the framework of choice for asignificant segment of the Enterprise Java development market. Spring Batchmakes the Spring programming model – based on simplicity and efficiency –easier to apply for batch applications.

What are batch applications? Batch applications processlarge amounts of data without human intervention. You’d opt to use bacthapplications to compute data for generating monthly financial statements,calculating statistics, and indexing files.

The most common scenario for a batch application isexporting data to files from one system and processing them in another. A batchapplication processes data automatically, so it must be robust and reliablebecause there is no human interaction to recover from an error. The greater thevolume of data a batch application must process, the longer it takes tocomplete. This means you must also consider performance in your batchapplication because it’s often restricted to execute within a specific timewindow. Every day, large and complex calculations take place to index billionsof documents, using cutting-edge algorithms like MapReduce. For data exchange,message-based solutions are also popular, having the advantage over batchapplications of being(close to) real time.




The goal of the Spring Batch project is to provide an opensource batch-oritened framework that effectively addresses the most common needsof batch applications.

Spring Batch isn’t a scheduler!

Spring Batch drives batch jobs (we use the terms job, batch,and process interchangeably) but doesn’t provide advanced support to launchthem according to a schedule. Spring Batch leaves this task to dedicatedschedulers like Quartz and cron. A scheduler triggers the launching of SpringBatch jobs by accessing the Spring Batch runtime ( like Quartz because it’s ajava solution) or by launching a dedicated JVM process( in the case of cron).Sometimes a scheduler launches batch jobs in sequence; first job A, and thenjob B if A succeeded, or job C if A failed. The scheduler can use the filesgenerated by the jobs or exit codes to orchestrate the sequence. Spring Batchcan aslo orchestrate such sequences itself; Spring Batch jobs are made ofsteps, and you can easily configure the sequence by using Spring Batch XML.

Should a whole batch fail because of one badly formattedline? Not always. The decision to skip an incorrect line or an incorrect itemis declarative in Spring Batch. It’s all about configuration. Components can trackeverything they do, and the framework provides them with the execution data onrestart. The components then know ehre they left off and can restart processingat the right place.


Spring Batch processes items in chunks. A job reads andwrites items in small chunks. Chunk processing allows streaming data instead ofloading all the data in memory. By default, chunk processing is single threadedand susally performs well. But some batch jobs need to execute faster, soSpring Batch provides ways to make chunk processing multi-threaded and todistribute processing on multiple physical nodes.

Partitioning splits a step into substeps, each of whichhandles a specific portion of the data. This implies that you know thestructure of the input data and that you know in advance how to distribute databetween substeps. Distribution can take place by ranges of primary key valuesfor database records or by directories for files. The substeps can executelocally or remotely, and Spring Batch provides support for multi-threadedsubsteps.

Spring Batch and grid computing

When dealing with large amounts of data—petabytes-a popularsolution to scaling is to divide the enormous amounts of computations intosmaller chunks, compute them in parallel(usually on different nodes), and thengather the results. Some open source frameworks(Haddop, GridGain, andHazelcast, for example) have appeared in the last few years to deal with theburden of distributing units of work so that developers can focus on developingthe computations themselves. How does Spring Batch compare to thesegrid-computing frameworks? Spring Batch is a loghtweight solution: all it needsis the Java Runtime installed, whereas grid-computing frameworks need a moreadvanced infrastucture. As an example, Hadoop usually works on top of its owndistributed fle system, HDFS. In terms of features, Spring Batch provides a lotof support to work with flat files, XML files, and relational database.


分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics