Hadoop Developer Resume
Plymouth, MN
SUMMARY:
- 6 years of professional experience, that includes development, deployment, maintenance and support of various projects in big organizations.
- Strong experience with Big Data and Hadoop technologies with excellent knowledge of Hadoop ecosystem: Hive, Spark, Kafka, Sqoop, Pig, HBase, Oozie, and Talend.
- Deep knowledge of Hadoop architecture (HDFS, YARN, MapReduce) along with their insight internal operations.
- Worked with Big Data Hadoop distributions like MapR, Cloudera, and Hortonworks.
- Experience in AWS cloud environment.
- Hands on experience on VPC, EC2, EMR, S3, Redshift, Cloudwatch, SNS .
- Experienced on Spark and performed various transformations and actions on large datasets using RDDs.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, and Spark YARN.
- Experience in capturing data and importing it to HDFS using Kafka for semi-structured data and Sqoop for existing relational databases
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Analyzed large data sets using Hive queries and Pig Scripts.
- Expertise in understanding Partitions, Bucketing concepts in Hive.
- Experienced in job workflow scheduling and monitoring tools like Oozie
- Worked on Talend Open Studio Data and Big Data integration and Preparation tools. Designed and performed ETL jobs using Talend Open Studio.
- Imported and exported data using Sqoop from HDFS to RDBMS.
- Exposure to file formats like Sequence, ORC, Parquet and JSON.
- Worked on NoSQL databases including Hbase.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Basic Knowledge of UNIX and shell scripting.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Spark, Scala, Kafka, Mapreduce, HBase, Pig, Hive, Sqoop, Oozie, Talend.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, HTML, CSS.
IDE s: Eclipse, SVN, Apache ANT, Log4J, Maven, JUnit, WinSCP.
NOSQL:: HBase.
DB Languages: SQL.
Application Server: Tomcat
Programming languages: C, Java, shell scripting.
Operating Systems: LINUX, Windows XP, 7, MS DOS.
PROFESSIONAL EXPERIENCE:
Confidential, Plymouth, MN
Hadoop Developer
Responsibilities:
- Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
- Implemented Kafka for streaming data and filtered, processed the data.
- Developed data pipeline using Kafka , Sqoop , Hive to ingest transactional data into HDFS for analysis.
- Developed Ingestion framework to read mainframe files and create hive snapshot tables on EDP.
- Created Hive tables based on business requirements. Wrote many Hive queries, UDFs and implemented concepts like Partitioning, Bucketing for efficient data access.
- Created Hive tables in Parquet and ORC file formats using Snappy and Gzip compression tools.
- Developed Spark code by using Scala/Spark-SQL for faster processing. Responsible for ingestion of data into EDP.
- Developed workflows using Oozie to automate the tasks.
- Involved in QA, test data creation, and unit testing activities.
- Involved in design, development and testing phases of Software Development Life Cycle.
- Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions.
Environment : Hadoop, spark, scala, kafka, Yarn, Hive, Oozie, Sqoop, Hortonworks.
Confidential, Eden Prairie, MN
Hadoop Developer
Responsibilities:
- Worked on analyzing, writing Hadoop Mapreduce jobs using Java API, Pig and Hive .
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting . .
- Implemented Partitioning, Dynamic Partitioning, Buckets in Hive .
- Developed PIG scripts using Pig Latin.
- Handled importing data from web logs, MySQL and various data sources using sqoop .
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Hbase , Hadoop Ecosystem and relational databases.
- Developed testing automation framework using Talend for record count check, duplicate check, field level validation and scd2 validation.
- Developed Spark code and Spark - SQL to extract data from Datalake to our Tenant to replicate Talend functionality.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Implemented Spark using Scala and utilizing Dataframes and Spark SQL API for faster processing of data.
- Written shell scripts for automation of job.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files. .
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop .
Environment : Apache Hadoop, Apache Spark, Scala, spark-sql, MapReduce, HDFS, Hive, Java, Pig, Hbase, Teradata, Talend, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, MapR.
Confidential, Basking Ridge, NJ
Hadoop Developer
Responsibilities:
- Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Datalake.
- Involved in configuring batch job to perform ingestion of the source files in to the Data lake
- Created several jobs in Talend ETL tool to perform transformation on source files .
- Used Pig to do the transformation of the data that were in the HDFS to fit the requirements.
- Created several Pig UDFs for the enrichment engine those were used to perform enrichment on the data.
- Developed Hive queries to load data to HBase .
- Leveraged Hive queries to create ORC tables.
- Created ORC tables to improve the performance for the reporting purposes.
- Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
- Created and altered Hbase tables on top of data residing in Datalake.
- Designed and Developed Reference table engine frameworks on Talend using Hadoop tools such as HDFS , Hive , Hbase Mapreduce .
- Experience on Talend components like transformation, file processing, java components, Unix, DB related and logging framework.
- Worked closely with System Analyst and Architects to design and develop Talend jobs to fit the business requirement.
- Experience in scheduling jobs in Talend .
- Worked on agile methodology using Rally .
Environment : Hadoop, Map Reduce, Yarn, Hive, Pig, Hbase, Sqoop, MapR, Talend, Core Java, Eclipse, Linux
Confidential, Burlington, MA
Hadoop Developer
Responsibilities:
- Involved in migrating data from slough to AWS using ETL.
- Responsible for creating Hive tables based on business requirements
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Worked on AWS cloud environment.
- Hands on experience on VPC , EC2, S3, EMR, Redshift, Data Pipeline , cloudwatch , sns .
- Demonstrate analytical and problem solving skills, particularly those that apply to a " Big Data " environment
- Developed scripts and improved the performance of the project by automating data management from end to end and embedded monitoring logic using cloudwatch and sns .
- Worked on EMR to convert the raw data to derived format and also to transfer data from one server to another.
- Worked on Sql workbench to load and aggregate the data from S3 to Redshift .
- Importing and exporting data into HDFS and Hive using Flume .
- Worked on Tableau dashboard on testing the performance of the dashboard by calculating the response time.
- Expert knowledge developing and debugging in Java/J2EE .
- Worked hands on with ETL process using Python and Java .
- Migrated all the on premise data from Salo , Oracle , MySQL to Amazon redshift using python , Attunity tool on Amazon EC2 instance.
- Developed data pipelines to process the data from the source systems directly into Redshift database.
- Wrote MapReduce jobs and integrated it with Oozie workflow for batch processing on huge datasets.
- Implemented Partitioning, Dynamic Partitioning and Bucketing in HIVE .
- Exported the result set from HIVE to MySQL using Sqoop after processing the data.
- Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions and daily stand ups.
Environment: Hadoop, HDFS, Hue, MapReduce, Hive,Pig,Sqoop,AWS,VPC,EC2,S3,EMR,Redshift,Data pipeline, cloudwatch, sns,Splunk, SQL Server, MySQL, Hbase, MongoDB, UNIX Shell Scripting.
Confidential, Roseville, CA
Hadoop Developer
Responsibilities:
- Responsible for building data solutions in Hadoop using Cascading frameworks.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked hands on with ETL process.
- Upgrading the Hadoop Cluster from CDH3 to CDH4. Integrate the HIVE with existing applications.
- Configured Ethernet bonding for all Nodes to double the network bandwidth.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Teradata into HDFS using Sqoop.
- Used Python and Shell scripts to automate the end-to-end ELT process
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Teradata, Cloudera Manager, Pig, Sqoop, Oozie, Python.
Confidential, Dallas, TX
Java/J2EE Developer
Responsibilities:
- Involved in designing and developing modules at both Client and Server Side.
- Developed the UI using JSP, JavaScript and HTML.
- Responsible for validating the data at the client side using JavaScript.
- Interacted with external services to get the user information using SOAP web service calls
- Developed web components using JSP, Servlets and JDBC.
- Designed the controller using Servlets.
- Accessed backend database Oracle using JDBC.
- Developed and wrote UNIX Shell scripts to automate various tasks.
- Developed user and technical documentation.
Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.
