We provide IT Staff Augmentation Services!

Sr. Hadoop Big Data Developer Resume

CA

PROFESSIONAL SUMMARY:

  • Over 10 years of experience in designing, developing, and maintaining large business applications involving data migration, integration, conversion, and data warehousing.
  • Well versed with developing and implementing MapReduce jobs using Hadoop to work with Big Data.
  • Have experience with Spark processing Framework such as Spark and Spark Sql, Experience in NoSQL databases like HBase, MongoDB.
  • Experience in importing and exporting data using Sqoop from Confidential to Relational Database Systems (RDBMS), Teradata and vice versa.
  • Skilled in creating workflows using Oozie for cron jobs and Strong experience in Hadoop Administration and Linux.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Hands on experience in PERL Scripting and Python, Extensive experience with SQL, PL/SQL and database concepts.
  • Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL.
  • Good experience working with Distributions such as MAPR, Horton works and Cloudera.
  • Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
  • Having good knowledge on Hadoop Administration like Cluster configuration, Single Node Configuration, Multi Node Configuration, Data Node Commissioning and Decommissioning, Name Node Backup and Recovery, HBase, Confidential and Hive Configuration, Monitoring clusters, Access control List.
  • Expertise database, such as Oracle 10g/11g and MySQL with hands - on experience on database programming with SQL and PL/SQL.
  • Knowledge of Nosql database such as Hbase, Experience with job workflow scheduler like Oozie.
  • Experience in creating data pipeline to move data from RDBMS to Confidential to RDBMS for improved Business Intelligence and Reports.
  • Experienced in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD), Data Flow Diagrams (DFD).
  • Designed, developed Informatica mappings, enabling the extract, transport and loading of the data into target tables and loading into the Teradata.
  • Created Workflow, Worklets and Tasks to schedule the loads Confidential required frequency using Workflow Manager and passed the data to Microsoft SharePoint.
  • Designed and developed Informatica mappings for data loads and data cleansing.
  • Created complex mappings using Aggregator, Expression, Joiner transformations
  • Involved in generating reports from Data Mart using OBIEE and working with Teradata.
  • Experience in importing and exporting data from different RDBMS like MySQL and Oracle into Confidential and Hive using Sqoop
  • Very good experience in Hadoop, Pig, mid-level usage of Hive, Sqoop, Yarn and designing and implementing Hive jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster
  • Experience in Design & Development and maintenance of NoSQL databases like Hbase.
  • Knowledge in development of reports using Business Objects, Cognos and Microstrategy.

TECHNICAL SKILLS:

Data Warehousing: Informatica Power Center, Power Exchange for DB2, Metadata Reporter Data Profiling, Data cleansing, Star & Snowflake Schema, Fact& Dimension Tables, Physical & Logical Data Modeling, DataStage, Erwin

Big Data Tools: Apache Hadoop ( Confidential, Map Reduce), Hive, HBase, Flume, Sqoop, Pig, Flume, Oozie, Zoo Keeper, Sqoop, Cloudera’s Distribution including Apache Hadoop (CDH) and administration

Business Intelligence Tools: Business Objects, Cognos

Databases: MS SQL Server, Oracle, Sybase, Teradata, MySQL, MS-Access, DB2

Database Tools: SQL*Plus, SQL*Loader, Export/Import, TOAD, SQL Navigator, SQL Trace

Development Languages: C, C++, XML, SQL, Confidential -SQL, PL/SQL, UNIX Shell Scripting

Other Tools and Technologies: MS Visual Source Safe, PVCS, Autosys, crontab, Mercury Quality center

PROFESSIONAL EXPERIENCE:

Confidential, CA

Sr. Hadoop Big Data Developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, Confidential, Developed fewMap Reduce jobs in java for data cleaning and preprocessing.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Worked on installing cluster, commissioning & decommissioning of Data node, Name node recovery, capacity planning, and slots configuration.
  • Worked in many financial modules created in ERP’s such as Sales, Purchasing, General Ledger, Inventory etc.
  • Also worked heavily on Multifunds, Hedge Funds, Credit Cards etc
  • Worked on moving all log files generated from various sources to Confidential for further processing.
  • Developed workflows using some mid-level custom Map Reduce, Pig, Hiveand Sqoop.
  • Implemented Flume to collect the data from various sources and is loaded in to Confidential .
  • Tuned the cluster for optimal performance to process these large data sets.
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Responsible for loading data from UNIX file system to Confidential .
  • Used Sqoop to import from different database sources and file systems to Confidential and vice versa.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Developed workflow in Control-M to automate tasks of loading data into Confidential and preprocessing with PIG and performed extensive Data Migration with Big Data
  • Used Maven extensively for building jar files of Map Reduce programs and deployed to Cluster.
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
  • Integrated BI tool with Impala and worked in analysis of Big Data
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into Confidential using Sqoop imports.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked as a part of AWS build team, Create, configure and managing S3 bucket(storage).
  • Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
  • Import the data from different sources like Confidential /Hbase into Spark RDD.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis
  • Migrated Hive QL queries on structured into Spark QL to improve performance
  • Optimized MapReduce Jobs to use Confidential efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
  • Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
  • Involved in performance of troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.

Environment: Hadoop, Big Data, Map Reduce, Hive QL, MySQL, HBase, Confidential, HIVE, Impala,, PIG, Sqoop, Oozie, Flume, Cloudera, Zookeeper, Hue Editor, Eclipse (Kepler), Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, UNIX, Tableau, Control-M.

Confidential

Sr. Hadoop developer

Responsibilities:

  • Acted in Modeling, Estimation, Requirement Analysis and Design of mapping document and planning using ETL, BI tools, MDM, Toad by various environmental sources.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed Sqoop jobs to move inbound files to Confidential file location based on monthly, weekly, daily and hourly partitioning.
  • Worked in data collecting and processing part and analytics, worked with DCAE Controller to contribute Policies regarding Network and managed the lifecycle events.
  • Developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
  • Implemented Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
  • Involved in creating data-models for customer data using Hive Query Language Developed multiple Map Reduce jobs in Python for data cleaning and preprocessing.
  • Wrote pig scripts to handled semi structured data as structured data and for inserting data into Hbase from Confidential .
  • Supported Hbase Architecture Design with the Hadoop Architect team to develop a Database Design in Confidential .
  • Transformed and aggregated data for analysis by implementing work flow management of Sqoop, Hive and Pig scripts.
  • Written scripts in Python with the Hadoop streaming API to maintain the data extraction.
  • Wrote UDFs in python for Hive queries for data analysis to meet the business requirements.
  • Involved in creating Hive tables and loading them with data and writing Hive queries.
  • Involved in importing data from Mysql tables to Confidential and Hbase tables using Sqoop.
  • Real time Streaming the data using Spark with Kafka for faster processing.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to Confidential using Scala and python.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs in Scala and python.
  • Involved in doing POC's for performance comparison of Spark SQL with Hive.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Worked on live Hadoop cluster running on Cloudera CDH 5x, Participated in requirement analysis and creation of data solution using Hadoop.
  • Involved in analyzing system specifications, designing and developing test plans.
  • Worked on ingestion process of the web log data into Hadoop platform and worked in extensive data integration using Big Data
  • Participated Confidential assigned user conferences, user group meetings, internal meetings, Prioritization, Production work list call etc
  • Well conversant with software testing methodologies including developing Design documents, Test plans, Test scenarios, Test cases and documentation.
  • Prepared documents for trouble shooting and performed Data manipulation with Big Data
  • Creating and executing SQL queries on an ORACLE database to validate and test data
  • Performed functional, regression, system testing, interface testing, integration testing and acceptance testing.

Environment: Cloudera, Spark SQL, Spark Streaming, Pig, Hive, Flume, Oozie, Java, Python, Scala, Eclipse,, Zookeeper, Cassandra, Hbase, Sqoop, GitHub, Docker

Confidential, Columbus, Ohio

Senior Developer/Hadoop-Big data/ETL

Responsibilities:

  • Interacted with business community and gathered requirements based on changing needs. Incorporated identified factors into Informatica mappings to build the Data Mart.
  • Ingested historical medical claim's data into Confidential Hive external tables were used for raw data and managed tables were used for intermediate tables.
  • Developed Hive Scripts (HQL) for automating the joins for different sources, Migration of ETL processes from MySQL to Hadoop utilizing Pig scripting and Pig UDF's as data pipe line for easy data manipulation.
  • Development of MapReduce programs and data migration from existing data source using Sqoop.
  • Devised schemes to collect and stage large data in Confidential and also worked on compressing the data using various formats to achieve optimal storage capacity
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
  • Involved in writing Map Reduce code using Java, Developed the custom writable Java programs to load the data into the Hbase.
  • Wrote Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on Oracle database. Used AWS to Runs tasks and stores data using the Hadoop Distributed File System ( Confidential ).
  • Involved in the tasks of resolving defects found in testing the new application and existing applications, Shell scripts were developed to add the process dates to the source files, to create trigger files
  • Developed various Big Data workflows using Oozie, Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.

Environment: Hadoop, Hive, Map Reduce, Confidential, Sqoop, Hbase, Pig, Oozie, AWS, Java, Bash, My-SQL, Oracle, Windows and Linux.

Confidential, Lansing, MI

Senior Developer/Hadoop-Big data/ETL

Responsibilities:

  • Interacted with business community and gathered requirements based on changing needs.
  • Responsible for importing data to Confidential using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into Confidential /HBase for analysis.
  • Developed an automated process using Shell script which drives the data pull process from RDBMs to Hadoop using Sqoop.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
  • Highly involved in creating tables on Hive, views for the data pulled.
  • Developed Hive queries for performing DQ checks on the data loaded to Confidential .
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
  • Visualize the Confidential data to customer using BI tool with the help of Hive ODBC Driver.
  • Designed test cases to test the connectivity to various RDBMs.
  • Conduct Knowledge Transfer (KT) sessions on the business value and technical functionalities incorporated in the developed modules for new recruits.

Environment: Hadoop 2.2.0, Map Reduce, Hive, Pig, HBase, Oozie, Sqoop, Flume, Core Java, Cloudera Distributed Hadoop (CDH), Confidential, RDBMS, JSON, XML.

Confidential, Lansing, MI

Sr. ETL developer

Responsibilities:

  • Created Mappings Connecting the Xml’s with Payload Db2 Databases and WPS tables in the Event store databases.
  • Created a successful integration with Flat files, Web services and Databases and integrated them in a Network and performed Dimensional modelling.
  • Acted in coordinating offshore ETL Development for EDW and weblogs and planned analysis using Deliverables include ERD, Data Models, Data Flow Diagrams, Use Cases, Gap Analysis and process flow documents and have expert understanding of Ralph Kimball.
  • Involved in the Analysis, Design, Coding and Testing of the application.
  • Designed and developed ELT (Extract transform & Load) solutions for Bulk transformations of client's data coming from Mainframe Db2 and performed Data Analysis using advanced techniques
  • Used Informatica DT Studio components like Parser, Serialiser etc. and created customized XML schemas which are configured using Unstructured Data transformation in Informatica.
  • Accessed Informatica data director and DT Studio projects, created well versed DT Studio scripts which are uploaded in server for usage of modifying existing Informatica schemas using unstructured data transformation and did governance using Informatica data director tool
  • Worked extensively with ODI Designer, operator and metadata Navigator.
  • Good understanding of ODI architecture and installation of Topology, Security Manager, agents, Designer, Operator, Master repository and Work repository.
  • Created and modified PL/SQL Triggers, Procedures, Functions and packages.
  • Developed ER Diagrams, Data flow diagrams based on the requirement.
  • Developed SQL scripts to create database objects like tables, views and sequences.
  • Used SQL*Loader to load bulk data from various flat files and legacy systems.
  • Developed SQL and PL/ SQL scripts for transfer of data between databases.
  • Designed and developed complex reports to meet end user requirements and deployed using Oracle Report 10g.
  • Developed complex SQL queries, triggers for building reports using Form, form letter and mailing label report styles.
  • Designed and developed user interfaces using Oracle Forms.
  • Proactively tuned SQL queries and performed refinement of the database design leading to significant improvement of system response time and efficiency.
  • Used FOR ALL and BULK COLLECT to fetch large volumes of data from table.
  • Performed unit testing and supported integration testing and end user testing.
  • Involved in logical and physical database design, Identified Fact Tables, Transaction Tables.
  • Involved in SQL tuning, PL/SQL tuning and Application tuning using various tools like TKPROF, EXPLAIN PLAN, DBMS PROFILER etc.
  • Developed Reports, Menus, Object Libraries and PL/SQL Library using Oracle Reports Developer.
  • Created group, tabular, and form reports.
  • Provided technical support to the user with regard to operational aspects.
  • Designed, developed ODI mappings using Oracle data integrator, enabling the extract, transport and loading of the data into target tables
  • Created Workflow, Worklets and Tasks to schedule the loads Confidential required frequency using Workflow Manager and passed the data to Microsoft SharePoint.
  • Created complex mappings using Aggregator, Expression, Joiner transformations and also worked in administration part of Oracle 11 and Oracle 10.
  • Used Source Analyzer and Warehouse designer to import the source and target database schemas, and the Mapping Designer to map the sources to the target.
  • Performed Configuration Management to migrate ODI mappings from Development to Test to production environment.

Environment: Informatica, Oracle 11g, Oracle data integrator, PL/SQL, SQL, Forms 10g &Reports10g, TOAD 9.5, Shell Scripting, SQL *Loader, PERL, XML and Windows XP.

Confidential, Seattle, WA

ETL developer

Responsibilities:

  • Involved in full project life cycle - from analysis to production implementation and support with emphasis on identifying the source and source data validation, developing particular logic and transformation as per the requirement and creating mappings and loading the data into Business intelligence database.
  • Created packages in SSIS with error handling and worked with different methods of logging in SSIS, heavy involvement in Gap analysis and Data migration between legacy and sql server
  • Worked with Confidential -SQL (DDL, DML) statements using dynamically generated SQL and developed complex stored procedures, triggers, tables, user functions, user profiles, relational database models and data integrity, SQL joins and query writing.
  • Developed SSIS packages using for each loop in Control Flow to process all excel files within folder, File System Task to move file into Archive and delete after processing.
  • Used various SSIS tasks such as Conditional Split, Derived Column, script task using visual basic which were used for Data Scrubbing, data validation checks during Staging, before loading the data into the Databases.
  • Used various transformations in SSIS like Lookup, Fuzzy grouping, Row count transformations
  • Schedule jobs for SSIS Packages using Autosys (r11) and created some custom dashboards using Tableau
  • Involved in the NDM (Network Data Mover) processing using windows shall script.
  • Involved in deploying SSIS Package into Production and used Package configuration to export various package properties to make package environment independent and used C#.Net, VB.Net and ASP.Net to configure several windows applications.
  • Gathered report requirements and determined the best solution to provide the results in either a Reporting Services report or created automation services framework using VB Scripting practices.
  • Wrote complex SQL queries and stored procedure to create reports using SSRS 2008
  • Generated parameterized/Drilldown reports using SSRS 2008
  • Created Subscription/Data driven Subscriptions using Snapshots to improve the performance of Reporting Server 2008.
  • Used Informatica Designer in Informatica to create complex mappings using different transformations like Filter, Router, Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Expressions and Aggregator transformations to pipeline data to Data Mart.
  • Developed functional and technical documentation to assist in the design, development and/or maintenance of deliverables, created detailed reports in Cognos report studio.
  • Experienced in monitoring and tuning SQL Server and database performance using SQL Profiler, Index Tuning Wizard and Windows Performance Monitor.
  • Developed and maintained system documentation, diagrams and flowcharts.

Environment: MS SQL Server 2008, Windows 2008, SSIS, SSRS, SQL Server, Management Studio, SQL Server Business Intelligence Studio, VB.NET, Informatica, SQL Profiler

Confidential, Sacramento, California

ETL Developer

Responsibilities:

  • Worked closely with business users while gathering requirements, analyzing data and supporting existing reporting solutions.
  • Involved in gathering of business scope and technical requirements and created technical specifications.
  • Developed complex mappings and SCD type-I, Type-II and Type III mappings in Informatica to load the data from various sources using different transformations like Source Qualifier, Lookup (connected and unconnected), Expression, Aggregate, Update Strategy, Sequence Generator, Joiner, Filter, Rank and Router and SQL transformations.
  • Worked with Healthcare Interchange standards including HL7, CCD or CCR
  • Created complex mapplets for reusable purposes, Deployed reusable transformation objects such as mapplets to avoid duplication of metadata, reducing the development time.
  • Created synonyms for copies of time dimensions, used the sequence generator transformation type to create sequences for generalized dimension keys, stored procedure transformation type for encoding and decoding functions and Lookup transformation to identify slowly changing dimensions.
  • Fine-tuned existing Informatica maps for performance optimization, also used MQ series for passing distributed data and also worked on Power center and Power exchange B2B.
  • Worked on Informatica Designer tools: Source Analyzer, Warehouse designer, Mapping Designer, Mapplet Designer, Transformation Developer and Server Manager to create and monitor sessions and batches.
  • Wrote tested and implemented Teradata Fastload, Multiload and Bteq scripts
  • Involved in the development of Informatica mappings and also tuned for better performance.
  • Debugged mappings by creating logic that assigns a severity level to each error, and sending the error rows to error table so that they can be corrected and re-loaded into a target system.
  • Experience specifying and conducting system testing (e.g. unit, integration, regression, load, performance) and user acceptance testing.
  • Analyzed existing system and developed business documentation on changes required.
  • Made adjustments in Data Model and SQL scripts to create and alter tables.
  • Extensively involved in testing the system from beginning to end to ensure the quality of the adjustments made to oblige the source system up-gradation.
  • Worked on various issues on existing Informatica Mappings to produce correct output.
  • Involved in intensive end user training (both Power users and End users in Report studio and Query studio) with excellent documentation support.

Environment: Informatica, Oracle 10g/9i, SQL, SQL Developer, Windows 2008 R2/7, Toad

Hire Now