We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • 8 years of total IT experience in all phases of Hadoop Development, Java Development along with experience in Application Development, Data modeling, Data mining
  • Good experience with Big Data Ecosystems, ETL
  • Expertise in Java, Python and Scala
  • Experience in data architecture including Data ingestion pipeline design, Data analysis and Data Analytics , advanced Data processing . Experience optimizing ETL workflows.
  • Experience in Hadoop (Cloudera, HortonWorks,MapR, IBM Big Insights) - Architecture, Deployment and Development.
  • Experience in extracting source data from Sequential files , XML file s, Excel files, transforming and loading it into the target Data warehouse .
  • Expertise in Java/J2EE technologies such as Core Java, Struts, Hibernate, JDBC, JSP, JSTL, HTML , JavaScript, JSON
  • Experience with database SQL and NoSQL (MongoDB) ( Cassandra )
  • Hands on experience with Hadoop Core Components (HDFS, MapReduce) and Hadoop Ecosystem (Sqoop, Flume, Hive, Pig, Impala, Oozie, HBase) .
  • Experience in ingesting real time/near real time data using Flume, Kafka, Storm
  • Experience in importing and exporting the data using Sqoop from Relational Database to HDFS and reverse.
  • Hands on Experience on Linux systems
  • Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files
  • Good knowledge in writing Spark application using Python, Scala and Java
  • Experience in writing MapReduce jobs .
  • Efficient in analyzing data using HiveQL , Pig Latin, partitioning an existing data set with static and dynamic partition, tune data for optimal query performance.
  • Good experience transformation and storage : HDFS , MapReduce , Spark
  • Good understanding of HDFS architecture .
  • Experienced in Database development, ETL, OLAP, OLTP
  • Knowledge of extracting an Avro schema using avro-tools and evolving an Avro schema by changing JSON files
  • Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX , NoSQL and a variety of portfolios.
  • Experience in UNIX Shell scripting .
  • Developing and maintaining applications on the AWS platform
  • Experience with developing and Maintaining Applications written for Amazon Simple Storage Service , Amazon Dynamo DB , Amazon Simple Queue Service, Amazon Simple Notification Service, Amazon Simple Workflow Service, AWS Elastic Beanstalk, and AWS Cloud Formation.
  • Picking the right AWS services for the application

WORK EXPERIENCE:

Confidential, Dallas, TX

Sr. Hadoop Developer

Responsibilities:

  • Helped the team to increase cluster size from 55 nodes to 145+ nodes. The configuration for additional data nodes was managed using Puppet.
  • Installed and configured Hadoop MapReduce , HDFS , developed multiple MapReduce jobs
  • Integrate Apache Spark with Hadoop components
  • Java for data cleaning and preprocessing.
  • Extensive experience in writing HDFS and Pig Latin commands.
  • Developed complex queries using HIVE and IMPALA .
  • Developed data pipeline using Flume , Sqoop , Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
  • Worked on importing data from HDFS to MYSQL database and vice - versa using SQOOP .
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Written Hive and Pig scripts as per requirements.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Developed Spark Application by using Scala
  • Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing
  • Used Spark Dataframes, Spark-SQL, Spark MLLib extensively
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
  • Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
  • Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components.
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
  • Developed the ETL mappings using mapplets and re-usable transformations, and various transformations such as source qualifier, expression, connected and un-connected lookup, router, aggregator, filter, sequence generator, update strategy, normalizer, joiner and rank transformations in Power Center Designer.
  • Created, altered and deleted topics ( Kafka Queues ) when required with varying
  • Performance tuning using Partitioning , bucketing of IMPALA tables.
  • Experience in NoSql database such as Hbase , MongoDB Involved in cluster maintenance and monitoring.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Involved in loading data from UNIX file system to HDFS .
  • Created an e-mail notification service upon completion of job or the particular team which requested for the data.
  • Worked on NOSQL databases which differ from classic relational databases.
  • Conducted requirements gathering sessions with various stakeholders
  • Involved in knowledge transition activities to the team members.
  • Successful in creating and implementing complex code changes.
  • Experience in AWS EC2 , configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch
  • Experience in S3 , Cloud Front and Route 53 .

Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.

Confidential, Wilmington, DE

Hadoop Developer

Responsibilities:

  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations
  • Wrote MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server , load and transform large sets of structured, semi - structured and unstructured data.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra .
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Java , and Python shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive .
  • Developed Scala scripts, UDFFs using both Data frames/ SQL /Data sets and RDD / MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop .
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context , Spark - SQL , Data Frames and Pair RDD's .
  • Predictive analytics (which can monitor inventory levels and ensure product availability)
  • Analysis of customers' purchasing behaviors
  • Response to value-added services based on clients' profiles and purchasing habits
  • Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig .
  • Providing pivotal graphs in order to show the trends
  • Maintenance of data importing scripts using Hive and MapReduce jobs
  • Developed and maintain several batch jobs to run automatically depending on business requirements
  • Unit testing and Deploying for internal usage monitoring performance of solution

Environment: Apache Hadoop, Hive, PIG, HDFS, Java Map-Reduce, Core Java, Scala, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume and Cloudera Distribution, Oracle, Teradata and MySql

Confidential, Carlsbad, CA

DATA ENGINEER

Responsibilities:

  • Exported data to a Mysql from HDFS using Sqoop and NFS mount approach.
  • Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
  • Developed Map Reduce programs for applying business rules on the data.
  • Developed and executed hive queries for denormalizing the data.
  • Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster .
  • Installed and configured Hadoop Cluster for development and testing environment.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Automated the workflow using shell scripts .
  • Performance tuning of the Hive queries, written by other developer.
  • Mastered major Hadoop distros HDP/CDH and numerous Open Source projects
  • Prototype various applications that utilize modern Big Data tools .

Environment: Linux, Java, Map Reduce, HDFS, DB2, Cassandra, Hive, Pig, Sqoop, FTP

Confidential, Dallas, Texas

Java Developer

Responsibilities:

  • Developed UI screens for data entry application in Java swing.
  • Worked on backend service in Spring MVC and openEJB for the interaction with Oracle and Mainframe using DAO and model objects.
  • Introduced Spring IOC to increase application flexibility and replace the need for hard - coded class based application functions
  • Used Spring IOC for dependency injection to autowire different beans and data source to the Application.
  • Used Spring JDBC templates for database interactions and used declarative Spring AOP transaction management.
  • Used mainframe screen scraping for adding forms to mainframe through the claims data entry application.
  • Worked on jasper reports (iReport 4.1.1) to generate reports for various people (executive secretary and commissioners) based on their authorization.
  • Generated Electronic letters for attorneys and insurance carriers using iReport.
  • Worked on application deployment on various tomcat server instances using putty .
  • Worked in TOAD for PL/SQL in Oracle database for writing queries, functions, stored procedures and triggers.
  • Worked on JSP, Servlets, HTML, CSS, JavaScript, JSON, Jquery, AJAX for Vault web based project and EDFP application.
  • Used Spring MVC architecture with dispatcher Servlet and view resolver for the web applications.
  • Worked on web service integration for EDFP project for integrating third party pay processing system with EDFP application.

Confidential

Java Developer

Responsibilities:

  • Involved in Requirement gathering, Analysis and Design using UML and OOAD
  • Worked on Presentation layer used JSP , Servlets , Struts
  • Extensively used Struts framework for MVC , used Struts framework in UI designing and validations
  • Created and deployed dynamic web pages using HTML, JSP, CSS and JavaScript
  • Worked on coding and deployments of EJB Stateless session beans
  • Interacted with Developers to follow up on Defects and Issues
  • Involved in the design and development of HTML presentation using XML, CSS, XSLT and XPath
  • Deployed J2EE web applications in BEA Weblogic.
  • Ported the Application onto MVC Model 2 Architecture in Struts Framework
  • Testing of the applications Review and troubleshooting
  • Migration of Existing flat file data to Normalize Oracle database
  • Used XML, XSD, DTD and Parsing APIs SAX and DOM XML based documents for information exchange.
  • Coded SQL, PL/SQL for backend processing and retrieval logic
  • Testing and implementation of the system and Installation of system
  • Involved in build and deploying the application using ANT builder
  • Used Microsoft Visual Source Safe( VSS ) and CVS as version control system
  • Worked on bug fixing and Production Support
  • Responsible for Coding , Unit Testing and Functional Testing and Regression Testing of the systems
  • Participated in technical discussion for architecture design, database and code enhancement
  • Participated in creating the demo site for the new release ERE website (Release 10)
  • Participated in full cycle development of the ERE Release 12 (Started Dec 2008)
  • Provided production support: trouble shooting for CAPRS system's issues, System generated errors and other problems related to ERE

We'd love your feedback!