Big Data Hadoop Developer Resume
Atlanta, GA
SUMMARY:
- 8 years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications .
- Around 4 years of IT Industry Experience ranging from Data Warehousing Big Data tools like Hadoop and Spark.
- Developed and Delivered Big Data solution for 4 different clients using tools like Hadoop, Spark, Hive, HBase, SQL, PIG, Talend, Kafka, Flume,Sqoop,Oozie and Azkaban.
- Good at conceptualizing and building solutions quickly and recently Developed a Data Lake using sub - pub Architecture. Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.
- Hands on experience in hadoop eco systems such as HDFS, MapReduce,Yarn, Pig, Hive, SparkSQL,Hbase, Oozie, Zookeeper, sqoop, flume, impala, kafka and strom, YARN.
- Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
- Experience in analyzing data using Hive, Pig Latin, HBase .
- Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
- Hands on experience in developing MapReduce programs according to the requirements
- Hands on experience in performing data cleaning, pre processing using Python and Talend data preperation tool
- Have a hands-on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration and Big Data
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets
- Expertise with NoSQL databases such as Hbase,MapRDB.
- Expertize in using ETL tools like Talend,SSIS (data migration, cleansing.)
- Expertise in working with different kind of data files such XML, JSON and Databases
- Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS using Sqoop
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation
- Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
- Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL.
- Worked extensively on different Hadoop distributions like CDH, Hortonworks,MapR
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive etc.,
- Proficient in using various IDEs like Eclipse , PyCharm.
- Hands on experience in job workflow scheduling and monitoring tools like Oozie , Azkaban
- Worked on Spark Eco system including spark Sql, sparkr, pyspark, Spark Streaming for data batch
- Processing on different applications, which provide large data sets, and to execute complex workflows.
- Excellent Programming skills at a higher level of abstraction using python and Spark
- Worked on Spark Eco system with Python as programming language.
- Good understanding in processing of real-time data using Spark.
- Experience in all phases of software development life cycle
- Extensive experience in using Oracle, SQL Server, DB2 and MySQL databases.
- Extensive experience in utilizing Agile methodologies for software development
- Followed agile methodology for development process
- Developed the system by following the agile methodology.
- Support development, testing, and operations teams during new system deployments
- Very good knowledge in SAP BI (ETL) and Data warehouse tools.
- Have good experience creating real time data streaming solutions using Kafka.
- Hand on experience on UNIX scripting.
- Responsible for designing and implementing ETL process using Talend to load data from different sources.
- Experience with working of cloud configuration in Amazon web services AWS
- Good Knowledge in web Services
- Have done reporting by using SSRS, SAPBO, and Tableau.
- Hands on experience with ETL Tools on SSIS, Talend and Reporting Tools like SSRS and Tableau.
TECHNICAL SKILLS:
Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, Yarn, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Python, Spark
Programming Languages: Java, SQL, PL/SQL, python, Scala, Linux shell Scripting.
Web Services: MVC, SOAP, REST
RDBMS: Oracle 10g, MySQL, SQL server, DB2.
No SQL: HBase, MaprDB, Cassandra
Data Bases: Oracle, Tera data, DB2, MS Azure.
ETL tools: Talend, SSIS
Agile, Water: Fall and Test Driven
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Big Data Hadoop Developer
Responsibilities:
- Involved in designing the Data pipeline from end to end to data in to the Data Lake.
- Developed a Canonical model on top of Hadoop Data Lake and enabled consumption layers for business end users.
- Developed Talend jobs in order to Load Data from various sources like oracle, Teradata and Sqlserver
- Worked on handling different data formats including XML, JSON, ORC, and Parquet.
- Strong Experience in Hadoop file system and Hive commands for data mappings.
- Worked on slowly changing Dimensions in Hive and implemented Exception logic in order to capture the corrupted data.
- Developed Hive Queries in order to Load into different layers like stg, hub and pub where each source have their own use cases.
- Developed Pyspark code in order to load from stg to hub implementing the business logic.
- Developed code in Sparksql for implementing Business logic with python as programming language.
- Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
- Worked on Sequence files , Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Responsible for developing the unit test cases from Kafka Producer to Consumer, in establishing the connection parameters from the JMS to Kafka.
- Worked in ingestion from RDBMS using Sqoop and Flume.
- Worked on designing the hive tables structure including creation, partitioning, transformations over the data based on Business requirements and enrich data from three-layered BI Data Lake.
- Developed documentation in individual modules like Application, Hive, Operations and Housekeeping in accordance to System integration and error Handling.
- Worked on hive data modelling and delegated a separate data hub for Multiple Hadoop datasets.
- Transformed the existing ETL logic to Hadoop mappings.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked on CDC logic in Hive in order to capture the Deletes, updates and inserts
- Developed Oozie work flows for execution process.
Environment: HadoopFrameworks,HDP.2.5,Hive2.1,Sqoop,Python,Spark2.1,Oracle,Teradata,SqlServer,Talend.
Confidential, Atlanta, GA
Big Data Hadoop Developer
Responsibilities:
- Developed a 10-node cluster in designing the Data Lake with the MapR Distribution .
- Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake.
- Worked on Standardization of xml from streaming data and load data to HDFS.
- Worked on Creating Hive Tables with struct format .
- Developed Spark code in order to load data to Elastic search .
- Replicated the Sqlserver schema in Hadoop without missing the business logic
- Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
- Designed and Created Hive tables in order to load data and to perform ETL logic
- Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest and to RAW Layer.
- Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
- Developed pyspark code in order to load data from Hive tables to HBASE.
- Migrated existing ETL flows in SSIS to Hadoop .
- Created Hive tables from HBase table based on business needs.
- Worked on Versioning of data in HBASE
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Used Spark API over MapR cluser to perform analytics on data in Hive.
- Developed python scripts using both Data frames/SQL and RDD/Map Reduce in Spark 2.1 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frames.
- Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark.
- Designed and developed Jenkins Jobs in order to execute the Python and Hive scripts
Environment: Hadoop Frameworks, MapR6.1, MapR5.2, Hive 2.1, Sqoop, Python, Spark2.1, Sqlserver, SSIS
Confidential, ATLANTA
Hadoop Developer
Responsibilities:
- Developed in writing MapReduce jobs.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Developed scripts in Hive for transforming data and extensively used event joins, filtered and did pre- Aggregations.
- Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Used Sqoop to import customer information data from SQL server database into HDFS for data processing.
- Developed Shell scripts for scheduling and automating the job flow.
- Developed a workflow using Talend to automate the tasks of loading the data into HDFS.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Responsible for processing ingested raw data using Kafka and Hive.
- Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in loading data from UNIX file system to HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Worked on the core and Spark SQL modules of Spark extensively.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
Environment: Hadoop, Yarn, Hive, Pig, Spark, Pyspark, Talend, Oozie, Sqoop, Flume, AWS, Redshift
Confidential, Duluth, GA
Hadoop/Spark Developer
Responsibilities:
- Developed a 16-node cluster in designing the Data Lake with the Horton Works Distribution .
- Worked on building a channel from Sonic JMS, to consume messages from the Soap Application.
- Responsible for data audit and data quality and the first point of contact for issues in data from different sources.
- Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake .
- Developed a Kafka producer which bring the data streams from JMS client and passes to the Kafka Consumer.
- Worked on programming the Kafka Producer and Consumer with the connection parameters and methods from Oracle Sonic JMS , until the Data Lake or HDFS.
- Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
- Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest n to RAW Layer.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Python.
- Developed code in reading multiple data formats on HDFS using PySpark.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python .
- Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
- Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
- Done Poc’s on cloud based technologies like AZURE and AWS
- Developed extremely complex dashboards using Tableau table calculations, quick filters, context filters, hierarchies, parameters and action filters.
- Published customized interactive reports and dashboards, report scheduling using Tableau server.
Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Hive, Sqoop, Python, Spark, Shell Scripting, Oracle SONIC JMS, Java 8.0 & 7.0, Eclipse, Tableau.
Confidential, ATLANTA
SSIS/Hadoop Developer
Responsibilities:
- Developed in writing MapReduce jobs.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- Aggregations.
- Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Developed Hive UDFs.
- Implemented Kafka for streaming data and filtered, processed the data.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
- Developed Shell scripts for scheduling and automating the job flow.
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
- Developed MapReduce jobs to calculate the total usage of data by commercial routers in different Locations, developed Map reduce programs for data sorting in HDFS
- Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager, Ambari.
- Load balancing of ETL processes, database performance tuning ETL processing tools.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Optimized Hive queries to extract the customer information from HDFS or HBase .
- Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
- Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.
Confidential, San Ramon, CA
SQL/Hadoop Developer
Responsibilities:
- Developed in writing Map Reduce jobs.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- Aggregations.
- Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Developed Hive UDFs.
- Implemented Kafka for streaming data and filtered, processed the data.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
- Developed Shell scripts for scheduling and automating the job flow.
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
- Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager.
- Load balancing of ETL processes, database performance tuning ETL processing tools.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Optimized Hive queries to extract the customer information from HDFS or HBase .
- Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.
Confidential
SQL Developer
Responsibilities:
- Interacted with Team and Analysis, Design and Develop database using ER Diagram, Normalization and relational database concept.
- Involved in Design, Development and testing of the system.
- Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan).
- Developed User Defined Functions and created Views.
- Created Triggers to maintain the Referential Integrity.
- Implemented Exceptional Handling.
- Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
- Creating and automating the regular Jobs.
- Tuned and Optimized SQL Queries using Execution Plan and Profiler.
- Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
- Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used jQuery in web based applications
- Developed the controller component with Servlets and action classes.
- Business Components are developed (model components) using Enterprise Java Beans (EJB).
- Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
- Analyzing System Requirements and preparing System Design document
- Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
- Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
- Used JMS elements for sending and receiving messages
- Used hibernate for mapping from Java classes to database tables
- Created and executed Test Plans using Quality Center by Test Director
- Mapped requirements with the Test cases in the Quality Center
- Supporting System Test and User acceptance test
- Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
- Involved in performing database Backup and Recovery.
- Worked on Documentation using MS word.
Environment: SQL Server 7.0/2000, SQL, T-SQL, BCP, Visual Basic 6.0/5.0, Crystal Reports 7/4.5, Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML .