Big Data Hadoop Developer Resume Atlanta, GA - Hire IT People

SUMMARY:

8 years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications .
Around 4 years of IT Industry Experience ranging from Data Warehousing Big Data tools like Hadoop and Spark.
Developed and Delivered Big Data solution for 4 different clients using tools like Hadoop, Spark, Hive, HBase, SQL, PIG, Talend, Kafka, Flume,Sqoop,Oozie and Azkaban.
Good at conceptualizing and building solutions quickly and recently Developed a Data Lake using sub - pub Architecture. Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.
Hands on experience in hadoop eco systems such as HDFS, MapReduce,Yarn, Pig, Hive, SparkSQL,Hbase, Oozie, Zookeeper, sqoop, flume, impala, kafka and strom, YARN.
Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
Experience in analyzing data using Hive, Pig Latin, HBase .
Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
Hands on experience in developing MapReduce programs according to the requirements
Hands on experience in performing data cleaning, pre processing using Python and Talend data preperation tool
Have a hands-on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration and Big Data
Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets
Expertise with NoSQL databases such as Hbase,MapRDB.
Expertize in using ETL tools like Talend,SSIS (data migration, cleansing.)
Expertise in working with different kind of data files such XML, JSON and Databases
Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS using Sqoop
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation
Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL.
Worked extensively on different Hadoop distributions like CDH, Hortonworks,MapR
Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive etc.,
Proficient in using various IDEs like Eclipse , PyCharm.
Hands on experience in job workflow scheduling and monitoring tools like Oozie , Azkaban
Worked on Spark Eco system including spark Sql, sparkr, pyspark, Spark Streaming for data batch
Processing on different applications, which provide large data sets, and to execute complex workflows.
Excellent Programming skills at a higher level of abstraction using python and Spark
Worked on Spark Eco system with Python as programming language.
Good understanding in processing of real-time data using Spark.
Experience in all phases of software development life cycle
Extensive experience in using Oracle, SQL Server, DB2 and MySQL databases.
Extensive experience in utilizing Agile methodologies for software development
Followed agile methodology for development process
Developed the system by following the agile methodology.
Support development, testing, and operations teams during new system deployments
Very good knowledge in SAP BI (ETL) and Data warehouse tools.
Have good experience creating real time data streaming solutions using Kafka.
Hand on experience on UNIX scripting.
Responsible for designing and implementing ETL process using Talend to load data from different sources.
Experience with working of cloud configuration in Amazon web services AWS
Good Knowledge in web Services
Have done reporting by using SSRS, SAPBO, and Tableau.
Hands on experience with ETL Tools on SSIS, Talend and Reporting Tools like SSRS and Tableau.

TECHNICAL SKILLS:

Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, Yarn, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Python, Spark

Programming Languages: Java, SQL, PL/SQL, python, Scala, Linux shell Scripting.

Web Services: MVC, SOAP, REST

RDBMS: Oracle 10g, MySQL, SQL server, DB2.

No SQL: HBase, MaprDB, Cassandra

Data Bases: Oracle, Tera data, DB2, MS Azure.

ETL tools: Talend, SSIS

Agile, Water: Fall and Test Driven

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Big Data Hadoop Developer

Responsibilities:

Involved in designing the Data pipeline from end to end to data in to the Data Lake.
Developed a Canonical model on top of Hadoop Data Lake and enabled consumption layers for business end users.
Developed Talend jobs in order to Load Data from various sources like oracle, Teradata and Sqlserver
Worked on handling different data formats including XML, JSON, ORC, and Parquet.
Strong Experience in Hadoop file system and Hive commands for data mappings.
Worked on slowly changing Dimensions in Hive and implemented Exception logic in order to capture the corrupted data.
Developed Hive Queries in order to Load into different layers like stg, hub and pub where each source have their own use cases.
Developed Pyspark code in order to load from stg to hub implementing the business logic.
Developed code in Sparksql for implementing Business logic with python as programming language.
Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
Worked on Sequence files , Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
Responsible for developing the unit test cases from Kafka Producer to Consumer, in establishing the connection parameters from the JMS to Kafka.
Worked in ingestion from RDBMS using Sqoop and Flume.
Worked on designing the hive tables structure including creation, partitioning, transformations over the data based on Business requirements and enrich data from three-layered BI Data Lake.
Developed documentation in individual modules like Application, Hive, Operations and Housekeeping in accordance to System integration and error Handling.
Worked on hive data modelling and delegated a separate data hub for Multiple Hadoop datasets.
Transformed the existing ETL logic to Hadoop mappings.
Extensive hands on experience in Hadoop file system commands for file handling operations.
Worked on CDC logic in Hive in order to capture the Deletes, updates and inserts
Developed Oozie work flows for execution process.

Environment: HadoopFrameworks,HDP.2.5,Hive2.1,Sqoop,Python,Spark2.1,Oracle,Teradata,SqlServer,Talend.

Confidential, Atlanta, GA

Big Data Hadoop Developer

Responsibilities:

Developed a 10-node cluster in designing the Data Lake with the MapR Distribution .
Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake.
Worked on Standardization of xml from streaming data and load data to HDFS.
Worked on Creating Hive Tables with struct format .
Developed Spark code in order to load data to Elastic search .
Replicated the Sqlserver schema in Hadoop without missing the business logic
Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
Designed and Created Hive tables in order to load data and to perform ETL logic
Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest and to RAW Layer.
Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
Developed pyspark code in order to load data from Hive tables to HBASE.
Migrated existing ETL flows in SSIS to Hadoop .
Created Hive tables from HBase table based on business needs.
Worked on Versioning of data in HBASE
Extensive hands on experience in Hadoop file system commands for file handling operations.
Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
Used Spark API over MapR cluser to perform analytics on data in Hive.
Developed python scripts using both Data frames/SQL and RDD/Map Reduce in Spark 2.1 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frames.
Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark.
Designed and developed Jenkins Jobs in order to execute the Python and Hive scripts

Environment: Hadoop Frameworks, MapR6.1, MapR5.2, Hive 2.1, Sqoop, Python, Spark2.1, Sqlserver, SSIS

Confidential, ATLANTA

Hadoop Developer

Responsibilities:

Developed in writing MapReduce jobs.
Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
Developed scripts in Hive for transforming data and extensively used event joins, filtered and did pre- Aggregations.
Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.
Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
Used Sqoop to import customer information data from SQL server database into HDFS for data processing.
Developed Shell scripts for scheduling and automating the job flow.
Developed a workflow using Talend to automate the tasks of loading the data into HDFS.
Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
Responsible for processing ingested raw data using Kafka and Hive.
Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
Import the data from different sources like HDFS/HBase into Spark RDD.
Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
Involved in loading data from UNIX file system to HDFS.
Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
Worked on the core and Spark SQL modules of Spark extensively.
Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.

Environment: Hadoop, Yarn, Hive, Pig, Spark, Pyspark, Talend, Oozie, Sqoop, Flume, AWS, Redshift

Confidential, Duluth, GA

Hadoop/Spark Developer

Responsibilities:

Developed a 16-node cluster in designing the Data Lake with the Horton Works Distribution .
Worked on building a channel from Sonic JMS, to consume messages from the Soap Application.
Responsible for data audit and data quality and the first point of contact for issues in data from different sources.
Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake .
Developed a Kafka producer which bring the data streams from JMS client and passes to the Kafka Consumer.
Worked on programming the Kafka Producer and Consumer with the connection parameters and methods from Oracle Sonic JMS , until the Data Lake or HDFS.
Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest n to RAW Layer.
Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Python.
Developed code in reading multiple data formats on HDFS using PySpark.
Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python .
Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
Done Poc’s on cloud based technologies like AZURE and AWS
Developed extremely complex dashboards using Tableau table calculations, quick filters, context filters, hierarchies, parameters and action filters.
Published customized interactive reports and dashboards, report scheduling using Tableau server.

Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Hive, Sqoop, Python, Spark, Shell Scripting, Oracle SONIC JMS, Java 8.0 & 7.0, Eclipse, Tableau.

Confidential, ATLANTA

SSIS/Hadoop Developer

Responsibilities:

Developed in writing MapReduce jobs.
Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- Aggregations.
Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
Developed Hive UDFs.
Implemented Kafka for streaming data and filtered, processed the data.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
Developed Shell scripts for scheduling and automating the job flow.
Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
Developed MapReduce jobs to calculate the total usage of data by commercial routers in different Locations, developed Map reduce programs for data sorting in HDFS
Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager, Ambari.
Load balancing of ETL processes, database performance tuning ETL processing tools.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Optimized Hive queries to extract the customer information from HDFS or HBase .
Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.

Confidential, San Ramon, CA

SQL/Hadoop Developer

Responsibilities:

Developed in writing Map Reduce jobs.
Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- Aggregations.
Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
Developed Hive UDFs.
Implemented Kafka for streaming data and filtered, processed the data.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
Developed Shell scripts for scheduling and automating the job flow.
Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager.
Load balancing of ETL processes, database performance tuning ETL processing tools.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Optimized Hive queries to extract the customer information from HDFS or HBase .
Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.

Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.

Confidential

SQL Developer

Responsibilities:

Interacted with Team and Analysis, Design and Develop database using ER Diagram, Normalization and relational database concept.
Involved in Design, Development and testing of the system.
Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan).
Developed User Defined Functions and created Views.
Created Triggers to maintain the Referential Integrity.
Implemented Exceptional Handling.
Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
Creating and automating the regular Jobs.
Tuned and Optimized SQL Queries using Execution Plan and Profiler.
Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used jQuery in web based applications
Developed the controller component with Servlets and action classes.
Business Components are developed (model components) using Enterprise Java Beans (EJB).
Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
Analyzing System Requirements and preparing System Design document
Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
Used JMS elements for sending and receiving messages
Used hibernate for mapping from Java classes to database tables
Created and executed Test Plans using Quality Center by Test Director
Mapped requirements with the Test cases in the Quality Center
Supporting System Test and User acceptance test
Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
Involved in performing database Backup and Recovery.
Worked on Documentation using MS word.

Environment: SQL Server 7.0/2000, SQL, T-SQL, BCP, Visual Basic 6.0/5.0, Crystal Reports 7/4.5, Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML .

We provide IT Staff Augmentation Services!

Big Data Hadoop Developer Resume

Atlanta, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship