Big Data/hadoop Developer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- 6+ years of IT experience in software development, big data management, data modeling, data integration, implementation and testing of enterprise class systems spanning big data frameworks, advanced analytics and Java/J2EE technologies.
- 3+ years of hands on experience in Hadoop components & Map Reduce programming for parsing and populating tables for Terabytes of data.
- Extensive usage of Sqoop, Flume, Oozie for data ingestion into HDFS & Hive warehouse.
- Experienced on major Hadoop ecosystem’s projects such as Pig, Hive and HBase.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Hands on performance improvement techniques for data processing in Hive, Impala, Spark, Pig & map-reduce using methods including but not limited to dynamic partitioning, bucketing, file compression.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and kafka.
- Expertise of ingesting data to Solr from HBase
- Extensively worked on debugging using Eclipse debugger.
- Experienced in importing data from various sources using StreamSets.
- Experience with Cloudera, Hortonworks & MapR Hadoop distributions.
- Strong work ethic with desire to succeed and make significant contributions to the organization.
- Strong problem-solving skills, effective communication, interpersonal skills and a good team player.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
- A pleasing personality with the ability to build great rapport with clients and customers.
- Illustrates excellent verbal and written communication, paired with great presentation and interpersonal skills.
- Portrays strong leadership qualities, backed with a great track record as a team player.
- Adept with the latest business/technological trends.
Spark & Transaction Processing
- Hands on experience with Spark-SQL for various business use-cases.
- Used Spark-SQL, Scala APIs for querying & transformation of data residing in Hive.
- Used python for Spark SQL jobs to fast process the data.
- Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing.
Core Competencies
- Hadoop Development & Troubleshooting
- Data Analysis
- Data Visualization & Reporting in Tableau
- Real-time Streaming using Spark.
- Map Reduce Programming
- Performance Tuning of Hive & Impala
- Ingesting data from HBase to Solr
- Data import using StreamSets.
TECHNICAL SKILLS:
Hadoop Ecosystems: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Spark and Zookeeper, Solr, StreamSets.
Apache Spark: Spark, Spark SQL, Spark Streaming, Scala.
ETL Tools: Informatica with Hadoop connector, Pentaho, Alteryx
Scripting Languages: Java, C, Scala, SQL, Unix Shell Scripting, Python
Java Technologies: JQuery, JSP, Servlets.
SQL Databases: Oracle, SQL Server 2012, SQL Server 2008 R2, DB2, Teradata
No-SQL: MongoDB, HBase.
Development tools: Maven, Eclipse, IntelliJ, PyCharm
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Big Data/Hadoop Developer
Responsibilities:
- Developed and Supported Map Reduce Programs those are running on the cluster.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Created Hive tables and working on them using Hive QL.
- Involved in installing Hadoop Ecosystem components.
- Validated Namenode, Data node status in a HDFS cluster.
- Handled 2 TB of data volume and implemented the same in Production.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Used to manage and review the Hadoop log files.
- Responsible to manage data coming from various sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Wrote Map Reduce job using Java API.
- Wrote Hive queries for data analysis to meet the business requirements.
- Installed and configured Pig and written PigLatin scripts.
- Developed UDFs for Pig Data Analysis
- Involved in managing and reviewing Hadoop log files.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Used JUnit for unit testing and Continuum for integration testing.
- Worked hands on with ETL process.
- Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster Integrate the HIVE with existing applications
- Configured Ethernet bonding for all Nodes to double the network bandwidth
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Imported Source/Target tables from the respective SAP R3 and BW systems and created reusable transformations (Joiner, Routers, Lookups, Rank, Filter, Expression and Aggregator) inside a Mapplets and created new mappings using Designer module of Informatica Power Center to implement the business logic and to load the customer healthcare data incrementally and full.
- Created Complex mappings using Unconnected Lookup, and Aggregate and Router transformations for populating target table in efficient manner.
- Optimized the mappings using various optimization techniques and debugged some existing mappings using the Debugger to test and fix the mappings.
- Update maps, sessions and workflows as a part of ETL change.
- Modifications to existing ETL Code and document the changes.
Environment: Java Hadoop, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera CDH3/4 Distribution, Informatica 9.1
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Created external hive tables to move the data from different source to cloudera.
- Used to keep track of the data once it gets loaded and updated on Weekly and daily basis
- Performed SQL Joins among Hive tables to make it into one table.
- Ingested data from Hive to HBase and HBase to Solr using Spark.
- Worked on Ingesting the data using StreamSets from various sources like JDBC to Hive by Sqoop jobs.
- Data import from Hive to Solr using StreamSets.
- Near-Real time indexing into Solr for automated process after scheduling the job.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Involved in developing ETL data pipelines for real-time streaming data using Kafka and Spark.
- Worked on POC to pull over the third-party data and used Spark SQL to create schema RDD and loaded it into Hive Tables and structured data using Spark SQL.
- Importing and exporting data into HDFS, Pig, Hive and HBase using Sqoop.
- Managing and reviewing Hadoop log files.
- Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop system.
- Responsible to manage data coming from different data sources.
- Developed Flume ETL job for handling data from HTTP source and sink as HDFS.
- Worked closely with Admin to setup with Kerberos Authentication.
- Write test cases to test software throughout development cycles, inclusive of functional/unit-testing/continuous integration.
- Manage operations, monitoring, and troubleshooting for all HADOOP development and production issues.
- Develop/Debug performance testing/tuning in the existing application.
- Detailed design specification document and implementing business rules.
- Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop system.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Gained experience in managing and reviewing Hadoop log files.
- Interacted closely with Web Developer for usage of application and to pull data from Solr as well as HBase and populate it in front end.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
- Implemented Spark SQL queries using Python for fast processing the data.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
- Exported analyzed data to relational databases using Sqoop for visualization to generate reports for the BI team.
Environment: Hive, HDFS, HBase, Solr, StreamSets, Spark, Kafka,Sqoop, Scala, IntelliJ, Python, PyCharm.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Developed data pipeline using Sqoop, Flume to store data into HDFS and further processing through spark.
- Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
- Wrote Python script to convert Autosys jobs, HDFS directory location paths from old standards to new standards.
- Wrote Python scripts for getting yarn job list for performance metrics.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Experience in customizing map reduce framework at different levels like input formats, data types, custom serde and partitioners.
- Pushed the data to Windows mount location for Tableau to import it for reporting.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming
- Worked on joins to create Hive look up tables.
- Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
- Analyzed large data sets by running Hive queries scripts.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed HIVE scripts passing dynamic parameters using hivevar.
- Created partitioned tables in Hive for best performance and faster querying.
- Configured build scripts for multi module projects with Maven.
- Automated the process of scheduling workflow using Oozie and Autosys.
- Prepared Unit test cases and performed unit testing.
- Created external table and partitioned tables in hive for querying purpose.
Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, Tableau, MySql.
Confidential
Java Developer
Responsibilities:
- Involved in requirement Analysis, Designing, Coding and Testing.
- Developed application on Agile scrum basis.
- Developed and implemented the MVC Architectural pattern using Struts Framework including JSP, Servlets, EJB and Action classes.
- Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and State diagrams and implemented these diagrams in Microsoft Visio.
- Involved in writing client-side validations using JavaScript, CSS.
- Designed and developed the UI using Struts view components HTML, CSS and JavaScript.
- Developed JMS API using J2EE package.
- Used Oracle as Database and used Toad for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Involved in designing test plans, test cases and overall Unit testing of the system.
- Prepared documentation and participated in preparing user's manual for the application.
Environment: Java, JQuery, Junit, Servlets, Spring 2.0, Web Logic, Eclipse, JSP, Windows XP, HTML, CSS, JavaScript, and XML.