We provide IT Staff Augmentation Services!

Big Data Hadoop Developer Resume

Atlanta, GA

SUMMARY:

  • 8 years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications .
  • Around 4 years of IT Industry Experience ranging from Data Warehousing Big Data tools like Hadoop and Spark.
  • Developed and Delivered Big Data solution for 4 different clients using tools like Hadoop, Spark, Hive, HBase, SQL, PIG, Talend, Kafka, Flume,Sqoop,Oozie and Azkaban.
  • Good at conceptualizing and building solutions quickly and recently Developed a Data Lake using sub - pub Architecture. Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.
  • Hands on experience in hadoop eco systems such as HDFS, MapReduce,Yarn, Pig, Hive, SparkSQL,Hbase, Oozie, Zookeeper, sqoop, flume, impala, kafka and strom, YARN.
  • Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
  • Experience in analyzing data using Hive, Pig Latin, HBase .
  • Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
  • Hands on experience in developing MapReduce programs according to the requirements
  • Hands on experience in performing data cleaning, pre processing using Python and Talend data preperation tool
  • Have a hands-on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration and Big Data
  • Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets
  • Expertise with NoSQL databases such as Hbase,MapRDB.
  • Expertize in using ETL tools like Talend,SSIS (data migration, cleansing.)
  • Expertise in working with different kind of data files such XML, JSON and Databases
  • Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS using Sqoop
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation
  • Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
  • Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL.
  • Worked extensively on different Hadoop distributions like CDH, Hortonworks,MapR
  • Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive etc.,
  • Proficient in using various IDEs like Eclipse , PyCharm.
  • Hands on experience in job workflow scheduling and monitoring tools like Oozie , Azkaban
  • Worked on Spark Eco system including spark Sql, sparkr, pyspark, Spark Streaming for data batch
  • Processing on different applications, which provide large data sets, and to execute complex workflows.
  • Excellent Programming skills at a higher level of abstraction using python and Spark
  • Worked on Spark Eco system with Python as programming language.
  • Good understanding in processing of real-time data using Spark.
  • Experience in all phases of software development life cycle
  • Extensive experience in using Oracle, SQL Server, DB2 and MySQL databases.
  • Extensive experience in utilizing Agile methodologies for software development
  • Followed agile methodology for development process
  • Developed the system by following the agile methodology.
  • Support development, testing, and operations teams during new system deployments
  • Very good knowledge in SAP BI (ETL) and Data warehouse tools.
  • Have good experience creating real time data streaming solutions using Kafka.
  • Hand on experience on UNIX scripting.
  • Responsible for designing and implementing ETL process using Talend to load data from different sources.
  • Experience with working of cloud configuration in Amazon web services AWS
  • Good Knowledge in web Services
  • Have done reporting by using SSRS, SAPBO, and Tableau.
  • Hands on experience with ETL Tools on SSIS, Talend and Reporting Tools like SSRS and Tableau.

TECHNICAL SKILLS:

Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, Yarn, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Python, Spark

Programming Languages: Java, SQL, PL/SQL, python, Scala, Linux shell Scripting.

Web Services: MVC, SOAP, REST

RDBMS: Oracle 10g, MySQL, SQL server, DB2.

No SQL: HBase, MaprDB, Cassandra

Data Bases: Oracle, Tera data, DB2, MS Azure.

ETL tools: Talend, SSIS

Agile, Water: Fall and Test Driven

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Big Data Hadoop Developer

Responsibilities:

  • Involved in designing the Data pipeline from end to end to data in to the Data Lake.
  • Developed a Canonical model on top of Hadoop Data Lake and enabled consumption layers for business end users.
  • Developed Talend jobs in order to Load Data from various sources like oracle, Teradata and Sqlserver
  • Worked on handling different data formats including XML, JSON, ORC, and Parquet.
  • Strong Experience in Hadoop file system and Hive commands for data mappings.
  • Worked on slowly changing Dimensions in Hive and implemented Exception logic in order to capture the corrupted data.
  • Developed Hive Queries in order to Load into different layers like stg, hub and pub where each source have their own use cases.
  • Developed Pyspark code in order to load from stg to hub implementing the business logic.
  • Developed code in Sparksql for implementing Business logic with python as programming language.
  • Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
  • Worked on Sequence files , Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Responsible for developing the unit test cases from Kafka Producer to Consumer, in establishing the connection parameters from the JMS to Kafka.
  • Worked in ingestion from RDBMS using Sqoop and Flume.
  • Worked on designing the hive tables structure including creation, partitioning, transformations over the data based on Business requirements and enrich data from three-layered BI Data Lake.
  • Developed documentation in individual modules like Application, Hive, Operations and Housekeeping in accordance to System integration and error Handling.
  • Worked on hive data modelling and delegated a separate data hub for Multiple Hadoop datasets.
  • Transformed the existing ETL logic to Hadoop mappings.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Worked on CDC logic in Hive in order to capture the Deletes, updates and inserts
  • Developed Oozie work flows for execution process.

Environment: HadoopFrameworks,HDP.2.5,Hive2.1,Sqoop,Python,Spark2.1,Oracle,Teradata,SqlServer,Talend.

Confidential, Atlanta, GA

Big Data Hadoop Developer

Responsibilities:

  • Developed a 10-node cluster in designing the Data Lake with the MapR Distribution .
  • Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake.
  • Worked on Standardization of xml from streaming data and load data to HDFS.
  • Worked on Creating Hive Tables with struct format .
  • Developed Spark code in order to load data to Elastic search .
  • Replicated the Sqlserver schema in Hadoop without missing the business logic
  • Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
  • Designed and Created Hive tables in order to load data and to perform ETL logic
  • Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest and to RAW Layer.
  • Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
  • Developed pyspark code in order to load data from Hive tables to HBASE.
  • Migrated existing ETL flows in SSIS to Hadoop .
  • Created Hive tables from HBase table based on business needs.
  • Worked on Versioning of data in HBASE
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Used Spark API over MapR cluser to perform analytics on data in Hive.
  • Developed python scripts using both Data frames/SQL and RDD/Map Reduce in Spark 2.1 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frames.
  • Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark.
  • Designed and developed Jenkins Jobs in order to execute the Python and Hive scripts

Environment: Hadoop Frameworks, MapR6.1, MapR5.2, Hive 2.1, Sqoop, Python, Spark2.1, Sqlserver, SSIS

Confidential, ATLANTA

Hadoop Developer

Responsibilities:

  • Developed in writing MapReduce jobs.
  • Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Developed scripts in Hive for transforming data and extensively used event joins, filtered and did pre- Aggregations.
  • Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Used Sqoop to import customer information data from SQL server database into HDFS for data processing.
  • Developed Shell scripts for scheduling and automating the job flow.
  • Developed a workflow using Talend to automate the tasks of loading the data into HDFS.
  • Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
  • Responsible for processing ingested raw data using Kafka and Hive.
  • Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in loading data from UNIX file system to HDFS.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.

Environment: Hadoop, Yarn, Hive, Pig, Spark, Pyspark, Talend, Oozie, Sqoop, Flume, AWS, Redshift

Confidential, Duluth, GA

Hadoop/Spark Developer

Responsibilities:

  • Developed a 16-node cluster in designing the Data Lake with the Horton Works Distribution .
  • Worked on building a channel from Sonic JMS, to consume messages from the Soap Application.
  • Responsible for data audit and data quality and the first point of contact for issues in data from different sources.
  • Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake .
  • Developed a Kafka producer which bring the data streams from JMS client and passes to the Kafka Consumer.
  • Worked on programming the Kafka Producer and Consumer with the connection parameters and methods from Oracle Sonic JMS , until the Data Lake or HDFS.
  • Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
  • Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest n to RAW Layer.
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Python.
  • Developed code in reading multiple data formats on HDFS using PySpark.
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python .
  • Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
  • Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
  • Done Poc’s on cloud based technologies like AZURE and AWS
  • Developed extremely complex dashboards using Tableau table calculations, quick filters, context filters, hierarchies, parameters and action filters.
  • Published customized interactive reports and dashboards, report scheduling using Tableau server.

Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Hive, Sqoop, Python, Spark, Shell Scripting, Oracle SONIC JMS, Java 8.0 & 7.0, Eclipse, Tableau.

Confidential, ATLANTA

SSIS/Hadoop Developer

Responsibilities:

  • Developed in writing MapReduce jobs.
  • Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- Aggregations.
  • Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Developed Hive UDFs.
  • Implemented Kafka for streaming data and filtered, processed the data.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
  • Developed Shell scripts for scheduling and automating the job flow.
  • Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
  • Developed MapReduce jobs to calculate the total usage of data by commercial routers in different Locations, developed Map reduce programs for data sorting in HDFS
  • Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager, Ambari.
  • Load balancing of ETL processes, database performance tuning ETL processing tools.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Optimized Hive queries to extract the customer information from HDFS or HBase .
  • Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
  • Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.

Confidential, San Ramon, CA

SQL/Hadoop Developer

Responsibilities:

  • Developed in writing Map Reduce jobs.
  • Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- Aggregations.
  • Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Developed Hive UDFs.
  • Implemented Kafka for streaming data and filtered, processed the data.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
  • Developed Shell scripts for scheduling and automating the job flow.
  • Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
  • Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager.
  • Load balancing of ETL processes, database performance tuning ETL processing tools.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Optimized Hive queries to extract the customer information from HDFS or HBase .
  • Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.

Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.

Confidential

SQL Developer

Responsibilities:

  • Interacted with Team and Analysis, Design and Develop database using ER Diagram, Normalization and relational database concept.
  • Involved in Design, Development and testing of the system.
  • Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan).
  • Developed User Defined Functions and created Views.
  • Created Triggers to maintain the Referential Integrity.
  • Implemented Exceptional Handling.
  • Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
  • Creating and automating the regular Jobs.
  • Tuned and Optimized SQL Queries using Execution Plan and Profiler.
  • Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
  • Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used jQuery in web based applications
  • Developed the controller component with Servlets and action classes.
  • Business Components are developed (model components) using Enterprise Java Beans (EJB).
  • Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
  • Analyzing System Requirements and preparing System Design document
  • Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
  • Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
  • Used JMS elements for sending and receiving messages
  • Used hibernate for mapping from Java classes to database tables
  • Created and executed Test Plans using Quality Center by Test Director
  • Mapped requirements with the Test cases in the Quality Center
  • Supporting System Test and User acceptance test
  • Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
  • Involved in performing database Backup and Recovery.
  • Worked on Documentation using MS word.

Environment: SQL Server 7.0/2000, SQL, T-SQL, BCP, Visual Basic 6.0/5.0, Crystal Reports 7/4.5, Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML .

Hire Now