Big Data Developer Resume
Dover, NH
SUMMARY:
- Above 6 years of Experience in Big Data/Hadoop, Data Analysis, Data Modeling professional with applied information Technology.
- Proficient in Data Modeling / DW/ Bigdata/ Hadoop/ Data Integration/ Master Data Management, Data Migration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using custom, open source and off the shelf BI tools
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of Teradata Big Data Analytics.
- Experienced in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
- Expertise in configuring the monitoring and alerting tools according to the requirement like AWS CloudWatch.
- Proficiency in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
- Data Warehousing: Full life-cycle project leadership, business-driven requirements, capacity planning, gathering, feasibility analysis, enterprise and solution architecture, design, construction, data quality, profiling and cleansing, source-target mapping, gap analysis, data integration/ETL, SOA, ODA, data marts, Inman/Kimball methodology, Data Modeling for OLTP, canonical modeling, Dimension Modeling for data ware house star/snowflake design.
- Experience in BI/DW solution (ETL,OLAP, Data mart), Informatica, BI Reporting tool like Tableau and Qlikview and also experienced leading the team of application, ETL, BI developers, Testing team .
- Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
- Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server.
- Extensive use of Talend ELT, database, data set, HBase, Hive, PIG, HDFS and SCOOP components.
- Experience in Installation, Configuration, and Administration of Informatica Power Center 8.x, 9.1 Client/Server.
- Experienced on Hadoop Ecosystem and Big Data components including Apache Spark, Scala, Python, HDFS, Map Reduce, KAFKA.
- Expertise in reading and writing data from and to multiple source systems such as oracle, HDFS, XML, delimited files, Excel, Positional and CSV files.
- Experience in Business Intelligence (BI) project Development and implementation using Microstrategy product suits including Microstrategy Desktop/Developer, Web, OLAP Services, Administrator and Intelligence server.
- Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
- Good understanding and hands on experience with AWS S3 and EC2.
- Good experience on programming languages Python, Scala.
- Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
- Excellent knowledge on creating reports on SAP Business Objects, Webi reports for multiple data providers.
- Created and maintained UDB DDL for databases, table spaces, tables, views, triggers, and stored procedures. Resolved lock escalations, lock-waits and deadlocks.
- Experience in working with business intelligence and data warehouse software, including SSAS, Pentaho, Cognos Database, Amazon Redshift, or Azure Data Warehouse.
- Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
- Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
- Extensive ETL testing experience using Informatica 9x/8x, Talend, Pentaho.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
- Experience with relational (3NF) and dimensional data architectures. Experience in leading cross-functional, culturally diverse teams to meet strategic, tactical and operational goals and objectives.
- Good exposure to BI reporting Microstrategy 8i, 9i & Tableau & SQL programming, RDBMS - Teradata, Oracle, and SQL server.
- Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement. Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and ZooKeeper.
Data Modeling Tools: Erwin, Oracle Designer, ER/Studio.
Cloud Platform: AWS.
No SQL Databases:: HBase, Cassandra, MongoDB.
ETL Tools: Pentaho, Informatica Power 9.6 etc.
Operating Systems:: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows.
Web/Application servers:: Apache Tomcat, WebLogic, JBoss.
Databases: Cassandra, MongoDB, DB2, SQL Server, MySQL, Teradata, Oracle 9i/10g/11g/12c.
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer.
Version control: SVN, CVS, GIT.
Web Services:: REST, SOAP.
Languages: C, Python, Scala, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, Python.
PROFESSIONAL EXPERIENCE:
Confidential, Dover, NH
Big Data Developer
Responsibilities:
- Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Worked with Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS.
- Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
- Developing Spark scripts, UDF's using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Using Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Experienced in loading the real-time data to NoSQL database like Cassandra.
- Developing scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Imported data from AWSS3 into Spark RDD, Performed transformations and actions on RDD's.
- Performed Data scrubbing and processing with Apache Nifi and for workflow automation and coordination.
- Used Sqoop to import data into HDFS and Hive from Oracle database.
- Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Used Talend for Big data Integration using Spark and Hadoop.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Optimized Hive queries to extract the customer information from HDFS.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
- Used Zookeeper to provide coordination services to the cluster.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Import data from sources like HDFS/HBase into Spark RDD.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Working on BI reporting with At Scale OLAP for Big Data.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL(cassandra).
- Implemented Kafka for streaming data and filtered, processed the data.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Developed Shell scripts for scheduling and automating the job flow.
- Developed a workflow using Nifi to automate the tasks of loading the data into HDFS.
- Developed Map Reduce jobs to calculate the total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS
- Load balancing of ETL processes, database performance tuning ETL processing tools.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, Python, Hadoop, AWS, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL.
Confidential, Houston, TX
Big Data Engineer
Responsibilities:
- Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizingBig Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive .
- Implemented Spark GraphX application to analyze guest behavior for data science segments.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked on batch processing of data sources using Apache Spark, Elastic search.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Worked on migrating PIG scripts and Map Reduce programs to Spark Data frames API and Spark SQL to improve performance.
- Created Hive External tables to stage data and then move the data from Staging to main tables.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, and EBS.
- Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
- Worked with NoSQL database HBase in getting real time data analytics.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as MapReduce Hive, Pig, and Sqoop.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data from different source(database & files) into Hive using Talend tool.
- Conducted POC's for ingesting data using Flume.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Designed, developed and maintained data integration programs in Hadoop and RDBMS environment with both RDBMS and NoSQL data stores for data access and analysis.
- Used all major ETL transformations to load the tables through Informatica mappings.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Conduct/Participate in project team meetings to gather status, discuss issues & action items
- Provide support for research and resolution of testing issues.
- Coordinating with Business for UAT sign off.
Environment: Hadoop, Cloudera, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Informatica, AWS, Windows & Microsoft Office.
Confidential, NYC
Sr. Data Modeler
Responsibilities:
- Data Modeler/Analyst in Data Architecture Team and responsible for Conceptual, Logical and Physical model for Supply Chain Project.
- Created and maintained Logical &Physical Data Models for the project. Included documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
- Involved in designing data warehouses and data lakes on regular (Oracle, SQL Server) high performance on big data (Hadoop - Hive and HBase) databases. Data modeling, Design, implement, and deploy high-performance, custom applications at scale on Hadoop /Spark.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database systems.
- Translated business requirements into working logical and physical data models for OLTP &OLAP systems.
- Creation of BTEQ, Fast export, Multi Load, TPump, Fast load scripts for extracting data from various production systems.
- Reviewed Stored Procedures for reports and wrote test queries against the source system (SQL Server-SSRS) to match the results with the actual report against the Datamart (Oracle)
- Owned and managed all changes to the data models. Created data models, solution designs and data architecture documentation for complex information systems.
- Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
- Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
- Performed gap analysis and dependency analysis for current & future systems.
- Involved in Extract, Transform and Load (ETL) data from spreadsheets, flat files, database tables and other sources using SQL Server Integration Services (SSIS) and SQL Server Reporting Service (SSRS) for managers and executives.
- Designed Star Schema Data Models for Enterprise Data Warehouse using Power Designer.
- Created Mapping documents for Staging, ODS &Data Mart Layers.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Created Communities, Domains, Assets, hierarchies in Collibra .
- Spearheaded the establishment of the Enterprise Business Glossary, including Business Terms, BT Descriptions, and Business Rules; the Tiering Criteria, encompassing Tier 1, 2 or 3; and the Data Linkages between the Metadata and Lineage documents for Collibra, IDQ, and IMM data governance tools. Integrated process to manage data quality
- Preparation of business (Collibra) and technical metadata (IBM Infosphere)
- Working on the OLAP for data warehouse and data mart developments using Ralph Kimball methodology as well as OLTP models, both and interacting with all the involved stakeholders and SME's to derive the solution.
- Created Use Case Diagrams using UML to define the functional requirements of the application.
- Created the best fit Physical Data Model based on discussions with DBAs and ETL developers.
- Identified required dimensions and Facts using Erwin tool for the Dimensional Model.
- Implemented ETL techniques for Data Conversion, Data Extraction and Data Mapping for different processes as well as applications.
- Created the best fit Physical Data Model based on discussions with DBAs and ETL developers.
- Created conceptual, logical and physical data models, data dictionaries, DDL and DML to deploy and load database table structures in support of system requirements.
- Designed ER diagrams (Physical and Logical using Erwin) and mapping the data into database objects.
- Validated and updated the appropriate Models to process mappings, screen designs, use cases, business object model, and system object model as they evolved and changed.
- Created Model reports including Data Dictionary, Business reports.
- Generated sql scripts and implemented the relevant databases with related properties from keys, constraints, indexes & sequences.
Environment: OLTP, DBAs, DDL, DML, Erwin, UML, diagrams, Snow-flak schema, SQL, Data Mapping, Metadata, OLTP, SAS, Informatica 9.5
Confidential
Data Modeler/Analyst
Responsibilities:
- Performed as a Data Analysis, Data Modeling, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Involved in the analysis of the existing credit card processing system, mapping phase according to functionality and data conversion procedure.
- Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
- Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN
- Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
- Created business requirement documents and integrated the requirements and underlying platform functionality.
- Maintained data model and synchronized it with the changes to the database.
- Designed and developed use cases, activity diagrams, and sequence diagrams using UML.
- Extensively involved in the modeling and development of Reporting Data Warehousing System
- Designed the database tables & created table and column level constraints using the suggested naming conventions for constraint keys.
- Used ETL tool BO DS to extract, transform and load data into data warehouses from various sources like relational databases, application systems, temp tables, flat files etc.
- Developed stored procedures and triggers.
- Wrote packages, procedures, functions, exceptions using PL/SQL.
- Reviewed the database programming for triggers, exceptions, functions, packages, procedures.
- Involved in the testing phase right from the Unit testing to the User Acceptance testing.
- Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
Environment: Erwin 4, MS Visio, Oracle 10g, SQL Server 2000, Business Object Data Integrator R2