Big Data/Hadoop Developer Resume Charlotte, NC - Hire IT People

PROFESSIONAL SUMMARY:

6+ years of IT experience in software development, big data management, data modeling, data integration, implementation and testing of enterprise class systems spanning big data frameworks, advanced analytics and Java/J2EE technologies.
3+ years of hands on experience in Hadoop components & Map Reduce programming for parsing and populating tables for Terabytes of data.
Extensive usage of Sqoop, Flume, Oozie for data ingestion into HDFS & Hive warehouse.
Experienced on major Hadoop ecosystem’s projects such as Pig, Hive and HBase.
Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
Hands on performance improvement techniques for data processing in Hive, Impala, Spark, Pig & map-reduce using methods including but not limited to dynamic partitioning, bucketing, file compression.
Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and kafka.
Expertise of ingesting data to Solr from HBase
Extensively worked on debugging using Eclipse debugger.
Experienced in importing data from various sources using StreamSets.
Experience with Cloudera, Hortonworks & MapR Hadoop distributions.
Strong work ethic with desire to succeed and make significant contributions to the organization.
Strong problem-solving skills, effective communication, interpersonal skills and a good team player.
Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
A pleasing personality with the ability to build great rapport with clients and customers.
Illustrates excellent verbal and written communication, paired with great presentation and interpersonal skills.
Portrays strong leadership qualities, backed with a great track record as a team player.
Adept with the latest business/technological trends.

Spark & Transaction Processing

Hands on experience with Spark-SQL for various business use-cases.
Used Spark-SQL, Scala APIs for querying & transformation of data residing in Hive.
Used python for Spark SQL jobs to fast process the data.
Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing.

Core Competencies

Hadoop Development & Troubleshooting
Data Analysis
Data Visualization & Reporting in Tableau
Real-time Streaming using Spark.
Map Reduce Programming
Performance Tuning of Hive & Impala
Ingesting data from HBase to Solr
Data import using StreamSets.

TECHNICAL SKILLS:

Hadoop Ecosystems: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Spark and Zookeeper, Solr, StreamSets.

Apache Spark: Spark, Spark SQL, Spark Streaming, Scala.

ETL Tools: Informatica with Hadoop connector, Pentaho, Alteryx

Scripting Languages: Java, C, Scala, SQL, Unix Shell Scripting, Python

Java Technologies: JQuery, JSP, Servlets.

SQL Databases: Oracle, SQL Server 2012, SQL Server 2008 R2, DB2, Teradata

No-SQL: MongoDB, HBase.

Development tools: Maven, Eclipse, IntelliJ, PyCharm

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Big Data/Hadoop Developer

Responsibilities:

Developed and Supported Map Reduce Programs those are running on the cluster.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Created Hive tables and working on them using Hive QL.
Involved in installing Hadoop Ecosystem components.
Validated Namenode, Data node status in a HDFS cluster.
Handled 2 TB of data volume and implemented the same in Production.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Used to manage and review the Hadoop log files.
Responsible to manage data coming from various sources.
Supported Map Reduce Programs those are running on the cluster.
Involved in HDFS maintenance and loading of structured and unstructured data.
Wrote Map Reduce job using Java API.
Wrote Hive queries for data analysis to meet the business requirements.
Installed and configured Pig and written PigLatin scripts.
Developed UDFs for Pig Data Analysis
Involved in managing and reviewing Hadoop log files.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Used JUnit for unit testing and Continuum for integration testing.
Worked hands on with ETL process.
Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster Integrate the HIVE with existing applications
Configured Ethernet bonding for all Nodes to double the network bandwidth
Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Imported Source/Target tables from the respective SAP R3 and BW systems and created reusable transformations (Joiner, Routers, Lookups, Rank, Filter, Expression and Aggregator) inside a Mapplets and created new mappings using Designer module of Informatica Power Center to implement the business logic and to load the customer healthcare data incrementally and full.
Created Complex mappings using Unconnected Lookup, and Aggregate and Router transformations for populating target table in efficient manner.
Optimized the mappings using various optimization techniques and debugged some existing mappings using the Debugger to test and fix the mappings.
Update maps, sessions and workflows as a part of ETL change.
Modifications to existing ETL Code and document the changes.

Environment: Java Hadoop, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera CDH3/4 Distribution, Informatica 9.1

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Created external hive tables to move the data from different source to cloudera.
Used to keep track of the data once it gets loaded and updated on Weekly and daily basis
Performed SQL Joins among Hive tables to make it into one table.
Ingested data from Hive to HBase and HBase to Solr using Spark.
Worked on Ingesting the data using StreamSets from various sources like JDBC to Hive by Sqoop jobs.
Data import from Hive to Solr using StreamSets.
Near-Real time indexing into Solr for automated process after scheduling the job.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
Involved in developing ETL data pipelines for real-time streaming data using Kafka and Spark.
Worked on POC to pull over the third-party data and used Spark SQL to create schema RDD and loaded it into Hive Tables and structured data using Spark SQL.
Importing and exporting data into HDFS, Pig, Hive and HBase using Sqoop.
Managing and reviewing Hadoop log files.
Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop system.
Responsible to manage data coming from different data sources.
Developed Flume ETL job for handling data from HTTP source and sink as HDFS.
Worked closely with Admin to setup with Kerberos Authentication.
Write test cases to test software throughout development cycles, inclusive of functional/unit-testing/continuous integration.
Manage operations, monitoring, and troubleshooting for all HADOOP development and production issues.
Develop/Debug performance testing/tuning in the existing application.
Detailed design specification document and implementing business rules.
Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop system.
Analyzed large data sets by running Hive queries and Pig scripts.
Involved in creating Hive tables and loading and analyzing data using hive queries.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Gained experience in managing and reviewing Hadoop log files.
Interacted closely with Web Developer for usage of application and to pull data from Solr as well as HBase and populate it in front end.
Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
Implemented Spark SQL queries using Python for fast processing the data.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
Exported analyzed data to relational databases using Sqoop for visualization to generate reports for the BI team.

Environment: Hive, HDFS, HBase, Solr, StreamSets, Spark, Kafka,Sqoop, Scala, IntelliJ, Python, PyCharm.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Developed data pipeline using Sqoop, Flume to store data into HDFS and further processing through spark.
Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
Wrote Python script to convert Autosys jobs, HDFS directory location paths from old standards to new standards.
Wrote Python scripts for getting yarn job list for performance metrics.
Created Hive Generic UDF's to process business logic that varies based on policy.
Experience in customizing map reduce framework at different levels like input formats, data types, custom serde and partitioners.
Pushed the data to Windows mount location for Tableau to import it for reporting.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming
Worked on joins to create Hive look up tables.
Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
Analyzed large data sets by running Hive queries scripts.
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed HIVE scripts passing dynamic parameters using hivevar.
Created partitioned tables in Hive for best performance and faster querying.
Configured build scripts for multi module projects with Maven.
Automated the process of scheduling workflow using Oozie and Autosys.
Prepared Unit test cases and performed unit testing.
Created external table and partitioned tables in hive for querying purpose.

Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, Tableau, MySql.

Confidential

Java Developer

Responsibilities:

Involved in requirement Analysis, Designing, Coding and Testing.
Developed application on Agile scrum basis.
Developed and implemented the MVC Architectural pattern using Struts Framework including JSP, Servlets, EJB and Action classes.
Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and State diagrams and implemented these diagrams in Microsoft Visio.
Involved in writing client-side validations using JavaScript, CSS.
Designed and developed the UI using Struts view components HTML, CSS and JavaScript.
Developed JMS API using J2EE package.
Used Oracle as Database and used Toad for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions.
Involved in designing test plans, test cases and overall Unit testing of the system.
Prepared documentation and participated in preparing user's manual for the application.

Environment: Java, JQuery, Junit, Servlets, Spring 2.0, Web Logic, Eclipse, JSP, Windows XP, HTML, CSS, JavaScript, and XML.

We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship