Hadoop/Spark developer Resume Plano, TX - Hire IT People

SUMMARY

Around 8 years of extensive IT experience in all phases of Software Development Life Cycle (SDLC), including 4+ years of strong experience working on Apache Hadoop ecosystem and Apache Spark.
Hadoop Stack
Worked extensively with Hadoop Distributions like Cloudera, Hortonworks.
In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
Experience in ingesting data from FTP/SFTP servers using Flume.
Experience in developing Kafka Consumer API using Spark Scala applications.
Data Processing
Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
Experience in developing Hive UDF’s and running hive scripts using different execution engines like Tez and Spark (Hive on Spark).
Experience in designing tables and views for reporting using Impala.
Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API’s.
Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
Work Flows
Rich experience in automating Sqoop and Hive queries using Oozie workflow.
Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.
Cloud Infrastructure
Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
File Formats
Experienced in working with different file formats - Avro, Parquet,RC and ORC.
Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala,Hue,Sqoop,Kafka,Oozie,Flume,Zookeeper, Spark, Cloudera and Hortonworks

Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, Real-time Streaming

Programming Languages: SQL, Java, J2EE, Scala and Unix shell scripting

Databases& NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase

Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database & Data Factory)

Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira.

PROFESSIONAL EXPERIENCE

Confidential - Plano, TX

Hadoop/Spark developer

Responsibilities:

Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Developed Spark API to import data into HDFS from Teradata and created Hive tables.
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Involved in running all the hive scripts through hive,Hive on Spark and some through Spark SQL.
Involved in performance tuning of Hive from design, storage and query perspectives.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scalafor faster data processing.
Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Integrated Hive and Tableau Desktop reports and published to Tableau Server.
Developed shell scripts for running Hive scripts in Hive and Impala.
Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
Used Jira for bug tracking and SVN to check-in and checkout code changes.
Continuous monitoring and managing the Hadoopcluster through Cloudera Manager.

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQLUNIX Shell Scripting, Cloudera.

Confidential - SFO, CA

Hadoop/Spark Developer

Responsibilities:

Involved in the Complete Software development life cycle (SDLC) to develop the application.
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
Worked with the Data Science team to gather requirements for various data mining projects.
Worked with different source data file formats like JSON, CSV, TSV etc.
Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
Import and export data between the environments like MySQL, HDFS and deploying into productions.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
Experience in Oozie workflow scheduler template to managevarious jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
Involved in importing and exporting data from HBase using Spark.
Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, MapReduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, Java, Oozie, Spark,Oracle, MySQL, Netezza and UNIX Shell Scripting.

Confidential - Milwaukee, WI

Java/Hadoop developer

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
Imported Datasets with Sqoop from different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
Installed and configured Hive on Hadoop cluster.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Developing and running MapReduce jobs on YARN and Hadoop cluster to produce daily and Monthly reports as per business requirements.
Scheduling and managing jobs on Hadoop cluster using Oozie work flow.
Experienced in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
Developed Hive Views for requirement analysis and created Hive tables to store the processed data .
Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
Performed Data transformations in Hive and used partitions, buckets for performance improvements.
Utilized cluster co-ordination services through ZooKeeper.
Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with business users

Environment: MapReduce, Java, Hadoop, Cloudera, Pig, Hive, Oozie, Sqoop, Oracle, ZooKeeper & Eclipse and UNIX Shell Scripting.

Confidential

Java Developer

Responsibilities:

Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
Involved in developed the application using Core Java, J2EE and JSP's.
Worked to develop this Web based application entitled EMR in J2EE framework which uses Hibernate for persistence, Spring for Dependency Injection and Junit for testing.
IntegratedREST APIwith Spring for consuming resources usingSpring Rest Templatesand developedRESTfulweb services interface to Java-based runtime engine and accounts.
Used JSP to develop the front-end screens of the application.
Built the admin module using Struts framework for the master configuration.
Used Struts tiles to display the front-end pages in a neat and efficient way.
Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
Developed nightly batch jobs which involved interfacing with external third party state agencies.
Test scripts for performance and accessibility testing of the application are developed.
Responsible for deploying the application in client UAT environment.
Prepared installation documents of the software, including Program Installation Guide and Installation Verification Document.
Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
Provided production support to maintain the application.

Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, My Eclipse,PL/SQL, WebSphereUML, Toad, Windows.

Confidential

Jr. Java Developer

Responsibilities:

Involved in Designing and Coding.
Used RAD to develop, test and deploy all the Java components.
Performed client-side validations using JavaScript.
Develop (Specify, create, modify, maintain, and test) software component(s) which are part of the software project on assigned technology platform.
Correct complicated defects and make major enhancements to resolve customer problems.
Developed Presentation Screens using Struts view tags.
Developing scalable applications in a dynamic environment, primarily using Java, Spring, webservices and object/relationship mapping tools.
Working in both UNIX and Windows environments.
Developing or modifying databases as needed to support application development, and continually providing support for internally developed applications.
Developing technical architecture documentation based upon business requirements.
Enhancing and maintaining existing application suite.
Communicating development status on a regular basis to technology team members

Environment: Java Servlets, J2EE, Spring, Struts, Hibernate, Eclipse IDE, RAD, JDBC, Web Services, SQL, HTML, DHTML, XSLT, Oracle, SOAP, Oracle, Agile(Scrum) and CSS

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship