Sr. Hadoop & Spark Developer Resume
Kansas City, MO
SUMMARY
- Over 11 Years of experience in designing and developing client/server and web based applications using J2EE technologies, which includes 5.5 years of experience in Big Data with good knowledge on HDFS and on Ecosystems.
- Hands on experience in installing, configuring by using Hadoop ecosystems components like Map reduce, HDFS, HBase, Oozie, Hive, Spark, Hcatalog, Sqoop, Kafka, Pig and Flume.
- Excellent understanding knowledge of Hadoop architecture and various components such as HDFS, Job tracker, Name node, Data node and Map reduce programming paradigm.
- Experience in using different applications development framework like Hibernate, Struts and Spring MVC for developing integrated applications using various java/J2EE technologies like servlets, JSP, JDBC and JSTL. Experience in developing service components using JDBC.
- Experience in ingesting real time/near real - time data using Flume, Kafka, Storm
- Experience in importing and exporting the data using Sqoop from Relational Database to HDFS and reverse.Hands on Experience on Linux systems
- Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewingHadooplog filesGood knowledge in writing Spark application using Python, Scala & Java.
- Writing MapReduce jobs.Efficient in analyzing data using HiveQL, Pig Latin, partitioning an existing data set with static and dynamic partition, tune data for optimal query performance.
- Good experience transformation and storage: HDFS, MapReduce, Spark.
- Good understanding of HDFS architecture.Experienced in Database development, ETL, OLAP, OLTP.
- Experience in architecting Hadoop clusters using major Hadoop distributions- CDH3 and CDH4.
- Knowledge in UI development, UX design, Front end development, Rich User interface design, Visual design and Team management.
- Experience in developing and designing SOAP and REST web Services.
- Extensive experience in writing SQL queries for My SQL server, Hadoop using SQLPLUS
- Extensive experience in requirement gathering, Analysis, Design, Reviews, Coding and Code reviews, Unit and Integration Testing. Picking the right AWS services for the application
- Hands on experience in working with Oracle, NoSQL and DB2.
- Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, My Eclipse, WebLogic workshop
- Having knowledge on Python programming.
- Developing and maintaining applications on the AWS platform
- Experience with developing and maintaining Applications written for Amazon Simple Storage Service, Amazon Dynamo DB, Amazon Simple Queue Service, Amazon Simple Notification Service, Amazon Simple Workflow Service, AWS Elastic Beanstalk, and AWS Cloud Formation.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, Spark1.2, Spark1.6, MapReduce, Pig, Hive, Flume, Sqoop, Scala, Oozie, Zookeeper, Spark-SQL, Streaming, Kafka, Hbase, Cassandra, Impala
Hadoop Distributions: Cloudera CDH3/4/5, Hortonworks, MapR 5.1/5.2
IDE'S: Eclipse, NetBeans and ScalaIDE
NoSQL Databases: Hbase, MongoDB, Cassandra
Web Services: SOAP, REST and XML
Frameworks: MVC, Struts, Hibernate and Springs
Programming Languages: C, Java, Python and Linux Shell scripts
SQL Databases: MySQL, DB2, MS-SQL Server, Oracle9i, 10g, 11g.
Network Protocols: TCP/IP, HTTP, DNS
PROFESSIONAL EXPERIENCE
Confidential, Kansas City, MO
Sr. Hadoop & Spark Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, Sqoop & Spark.
- Developed Spark code using Scala for faster processing of data.
- AGILE development methodology has been followed to develop the application.
- Installing and configuring a Hadoop Cluster on a different platform like Cloudera, Pivotal HD and AWS-EMR with other ecosystems like Sqoop, Hbase, Hive and Spark.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing
- Used Spark Dataframes, Spark-SQL, Spark MLLib extensively.
- Integrated Apache Storm with Kafka to perform web analytics.
- Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
- Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
- Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Developed the ETL mappings using mapplets and re-usable transformations, and various transformations such as source qualifier, expression, connected and un-connected lookup, router, aggregator, filter, sequence generator, update strategy, normalizer, joiner and rank transformations in Power Center Designer.Experience in NoSql database such as Hbase, MongoDB.
- Created, altered and deleted topics (Kafka Queues) when required with varying
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Involved in cluster maintenance and monitoring.
- Load and transform large sets of structured, semi structured and unstructured data
- Involved in loading data from UNIX file system to HDFS.
- Created an e-mail notification service upon completion of job or the particular team which requested for the data.Worked on NOSQL databases which differ from classic relational databases.
- Conducted requirements gathering sessions with various stakeholders
- Involved in knowledge transition activities to the team members.
- Successful in creating and implementing complex code changes.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server’s health through Cloud Watch. Experience in S3, Cloud Front and Route 53.
Environment: Hadoopv2/Yarn-2.4, Spark, AWS, MapReduce, Teradata 15.0, Hive, REST, Sqoop, Flume, Pig, Cloudera, Kafka, SSRS.
Confidential, Bellevue, WA
Sr. Hadoop Developer
Responsibilities:
- Evaluated stability of Hadoop and its ecosystem to the above project and implemented various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Estimated software & hardware requirements for the Name Node and Data Node & planning the cluster.
- Extracted the needed data from the server into HDFS and Bulk loaded the cleaned data into HBase.
- Written the Map Reduce programs, Hive UDF's in Java where the functionality is too complex.
- Involved in loading data from LINUX file system to HDFS.
- Developed HIVE queries for the analysis, to categorize different items.
- Designing and creating Hive external tables using shared meta-store instead of derby with Partitioning, Dynamic Partitioning and Buckets.
- Given POC of FLUME to handle the real time log processing for attribution reports.
- Maintained system integrity of all sub-components (Primarily HDFS, MR, Hbase, and Hive)
- Reviewed peer table creation in Hive, data loading and queries.
- Monitored system health and logs and respond accordingly to any warning or failure conditions.
- Responsible to manage the test data coming from different sources.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Created and maintained Technical Documentation for launching Hadoop Clusters and for executing Hive queries and Pig scripts.
- Involved in Unit testing, Interface testing, system testing, and user acceptance testing of the workflow tool.
Environment: Apache Hadoop, HDFS, Hive, Spark, Map Reduce, Java, Flume, Cloudera, Oozie, MySQL, UNIX, Core Java.
Confidential, Baltimore MD
Hadoop Developer
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Installed and configured Hive on the Hadoop cluster.
- Work closely (face-to-face) with the accountants, financial analysts, data analysts, data scientists, statisticians, compliance, sales, marketing, pricing strategists and product development.
- Developed simple to complex Map/Reduce streaming jobs using Java language that is implemented using Hive and Pig.
- Optimized Map/Reduce jobs to use HDFS efficiently by using compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Tested Apache(TM) Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Used Impala to query the Hadoop data stored in HDFS.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Exported the analyzed data to the relational databases using Hive for visualization and to generatereports for the BI team.
- Perform data analysis on large datasets and present results to risk, Finance, accounting, pricing, sales, marketing, and compliance teams.
- Experienced on loading and transforming of large sets of structured, semi structured, and Unstructured data.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Environment: Hadoop - PIG, Hive, Cloudera manager, HiveQL, 30 Node clusters with Linux-Ubuntu
Confidential, New York
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experienced in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible for data coming from different sources.
- Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Got good experience with NoSQL database. Involved in loading data from UNIX filesystem to HDFS.
- Supported Map Reduce programs those are running on the cluster.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way. Installed and configured Hive and also written Hive UDFs.
Environment: Linux, Hadoop, Java6, HBase, Eclipse, Hive, MySQL, Hbase, Sqoop, Pig, Flume.
Confidential
Java Developer
Responsibilities:
- Involved in design and implementation of server-side programming.
- Involved in gathering requirements, analyzed them and prepared high-level documents.
- Participated in all client meetings to understand the requirements.
- Actively involved in designing and data modelling using Rational Rose Tool (UML).
- Involved in the design of the SPACE database.
- Designed and development of User Interfaces, Menus using HTML, JSP, JSP Custom Tag, JavaScript.
- Implemented User Interface using spring tiles framework.
- Tuxedo server, which provides case details, is fetched by the help of web services technology (i.e binding, finding a service and use of XML message format etc.).
- Involved in integration system with BT's systems like GTC, CSS or through e Link hub and IBMMQ series. Developed, Deployed and tested JSP's, Servlets in Web logic.
- Used Eclipse as IDE tool and integrated Web Logic with Eclipse to deploy and develop the applications and JDBC to connect the database.
Environment: Struts Framework, Java 1.3, XML, Data Modelling, JDBC, SQL, Pl/SQL, JMS, Web Services, SOAP, Solaris 9, ANT tool, Toad, Eclipse.
Confidential
Java Developer
Responsibilities:
- Involved in requirements collection & analysis from the business team.
- Created the design documents with use case diagram, class diagram, and the sequence diagrams using rational rose. Implemented the MVC architecture using Apache Struts framework.
- Implemented Action Classes and server-side validations for account activity, payment history and transactions. Implemented views using struts tags, JSTL and Expression Language.
- Implemented session beans to handle business logic for fund transfer, loan, credit card & fixed deposit modules. Worked with various Java patterns such as singleton and factory pattern at the business layer for effective objective behaviors.
- Worked on the Java collections API for handling the data objects between the business layers and the front end. Developed unit test cases using Junit.
- Developed ant scripts and developed builds using Apache ANT.
- Used clear case for source code maintenance.
Environment: J2EE1.4, Java2, Tiles, JSP1.2, Java Mail, ClearCase, ANT, JavaScript, JMS