We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Plano, TX

SUMMARY

  • Around 8 years of extensive IT experience in all phases of Software Development Life Cycle (SDLC), including 4+ years of strong experience working on Apache Hadoop ecosystem and Apache Spark.
  • Hadoop Stack
  • Worked extensively with Hadoop Distributions like Cloudera, Hortonworks.
  • In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
  • Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
  • Experience in ingesting data from FTP/SFTP servers using Flume.
  • Experience in developing Kafka Consumer API using Spark Scala applications.
  • Data Processing
  • Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
  • Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
  • Experience in developing Hive UDF’s and running hive scripts using different execution engines like Tez and Spark (Hive on Spark).
  • Experience in designing tables and views for reporting using Impala.
  • Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API’s.
  • Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
  • Work Flows
  • Rich experience in automating Sqoop and Hive queries using Oozie workflow.
  • Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.
  • Cloud Infrastructure
  • Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
  • File Formats
  • Experienced in working with different file formats - Avro, Parquet,RC and ORC.
  • Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala,Hue,Sqoop,Kafka,Oozie,Flume,Zookeeper, Spark, Cloudera and Hortonworks

Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, Real-time Streaming

Programming Languages: SQL, Java, J2EE, Scala and Unix shell scripting

Databases& NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase

Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database & Data Factory)

Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira.

PROFESSIONAL EXPERIENCE

Confidential - Plano, TX

Hadoop/Spark developer

Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive,Hive on Spark and some through Spark SQL.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scalafor faster data processing.
  • Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
  • Used Jira for bug tracking and SVN to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoopcluster through Cloudera Manager.

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQLUNIX Shell Scripting, Cloudera.

Confidential - SFO, CA

Hadoop/Spark Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Worked with different source data file formats like JSON, CSV, TSV etc.
  • Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
  • Import and export data between the environments like MySQL, HDFS and deploying into productions.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Experience in Oozie workflow scheduler template to managevarious jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, MapReduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, Java, Oozie, Spark,Oracle, MySQL, Netezza and UNIX Shell Scripting.

Confidential - Milwaukee, WI

Java/Hadoop developer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
  • Imported Datasets with Sqoop from different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
  • Installed and configured Hive on Hadoop cluster.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing
  • Developing and running MapReduce jobs on YARN and Hadoop cluster to produce daily and Monthly reports as per business requirements.
  • Scheduling and managing jobs on Hadoop cluster using Oozie work flow.
  • Experienced in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Developed Hive Views for requirement analysis and created Hive tables to store the processed data .
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Utilized cluster co-ordination services through ZooKeeper.
  • Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with business users

Environment: MapReduce, Java, Hadoop, Cloudera, Pig, Hive, Oozie, Sqoop, Oracle, ZooKeeper & Eclipse and UNIX Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
  • Involved in developed the application using Core Java, J2EE and JSP's.
  • Worked to develop this Web based application entitled EMR in J2EE framework which uses Hibernate for persistence, Spring for Dependency Injection and Junit for testing.
  • IntegratedREST APIwith Spring for consuming resources usingSpring Rest Templatesand developedRESTfulweb services interface to Java-based runtime engine and accounts.
  • Used JSP to develop the front-end screens of the application.
  • Built the admin module using Struts framework for the master configuration.
  • Used Struts tiles to display the front-end pages in a neat and efficient way.
  • Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
  • Developed nightly batch jobs which involved interfacing with external third party state agencies.
  • Test scripts for performance and accessibility testing of the application are developed.
  • Responsible for deploying the application in client UAT environment.
  • Prepared installation documents of the software, including Program Installation Guide and Installation Verification Document.
  • Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
  • Provided production support to maintain the application.

Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, My Eclipse,PL/SQL, WebSphereUML, Toad, Windows.

Confidential

Jr. Java Developer

Responsibilities:

  • Involved in Designing and Coding.
  • Used RAD to develop, test and deploy all the Java components.
  • Performed client-side validations using JavaScript.
  • Develop (Specify, create, modify, maintain, and test) software component(s) which are part of the software project on assigned technology platform.
  • Correct complicated defects and make major enhancements to resolve customer problems.
  • Developed Presentation Screens using Struts view tags.
  • Developing scalable applications in a dynamic environment, primarily using Java, Spring, webservices and object/relationship mapping tools.
  • Working in both UNIX and Windows environments.
  • Developing or modifying databases as needed to support application development, and continually providing support for internally developed applications.
  • Developing technical architecture documentation based upon business requirements.
  • Enhancing and maintaining existing application suite.
  • Communicating development status on a regular basis to technology team members

Environment: Java Servlets, J2EE, Spring, Struts, Hibernate, Eclipse IDE, RAD, JDBC, Web Services, SQL, HTML, DHTML, XSLT, Oracle, SOAP, Oracle, Agile(Scrum) and CSS

Hire Now