Big Data Engineer Resume
Bronx, NY
SUMMARY:
- 7 + years of Experience in Big Data, Data Analysis and Data Modeling professional with applied information Technology.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase and MongoDB,
- Experienced in Technical consulting and end - to-end delivery with data analysis, data modeling, data governance and design - development - implementation of solutions.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (HDFS, PIG, HIVE, Flume, Sqoop)
- Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
- Used Informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases.
- Expertise in Data Governance, Collibra Software, and Business Analytics.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Strong background in various Data Modeling tools using ERWIN, ER Studio, MS Visio.
- Extensive experience in Relational Data Modeling, Dimensional Data Modeling, Logical data model/Physical data models Designs, ER Diagrams, Forward and Reverse Engineering, Publishing ERWIN diagrams, analyzing data sources and creating interface documents.
- Designed and developed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data Modeling tools like ERWin.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Developing Big data solutions using Data ingestion, Data Storage.
- Good Knowledge with cloud technologies like Azure and AWS (EMR, S3, Red Shift, EC2, DynamoDB).
- Hands on experience on developing UDF, DATA Frames and SQL Queries in Spark SQL.
- Implemented Python Data Analysis using Pandas, Matplotlib,Seaborn,TensorFlow, and Numpy.
- Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place huge amount of data.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores.
- Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
- Highly motivated to work on Python, R scripts for statistics analytics for generating reports for Data Quality.
- Excellent knowledge in SQL and coding PL/SQL Packages, Procedures.
- Capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
- Good understanding and exposure to Python programming.
- Developed PL/SQL programs (Functions, Procedures, Packages and Triggers).
- Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableau adhoc reports.
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings.
- Experience in cluster monitoring tools like Apache hue.
- Good experience in using Sqoop for traditional RDBMS data pulls.
- Strong experience in database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
- Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Hive, Pig, Sqoop, Flume, Oozie, Storm and ZooKeeper.
Data Modeling Tools: Erwin, Oracle Designer, ER/Studio.
ETL Tools: Pentaho, Informatica Power 9.6 etc.
Operating Systems:: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows.
Cloud Platform: Azure, AWS.
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
OLAP Tools:: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9
No SQL Databases:: HBase, Cassandra, MongoDB.
Web/Application servers:: Apache Tomcat, WebLogic, JBoss.
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer.
Version control: SVN, CVS, GIT.
Web Services:: REST, SOAP.
Languages: C, Python, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, Python.
PROFESSIONAL EXPERIENCE:
Confidential - Bronx, NY
Big Data Engineer
Responsibilities:
- Worked with Architecture team to design end to end streaming and Batch data work flows.
- Analyzed large and critical datasets using HDFS, HBase, Hive, Hive UDF, Pig, Sqoop, Zookeeper.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Using Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Experienced in loading the real-time data to NoSQL database like Cassandra.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Developing scripts in Pig for transforming data and extensively used event joins, filtered and done pre- aggregations.
- Performed Data scrubbing and processing with Apache Nifi and for workflow automation and coordination.
- Used Sqoop to import data into HDFS and Hive from Oracle database.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Developed Simple to complex streaming jobs using Python, Hive and Pig.
- Optimized Hive queries to extract the customer information from HDFS.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
- Used Zookeeper to provide coordination services to the cluster.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Built Azure Data Warehouse Table Data sets for Power BI Reports.
- Import data from sources like HDFS/HBase into Spark RDD.
- Working on BI reporting with At Scale OLAP for Big Data.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL(cassandra).
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Developed a workflow using Nifi to automate the tasks of loading the data into HDFS.
- Load balancing of ETL processes, database performance tuning ETL processing tools.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Environment: HIVE, Pig, Mahout, NiFi, Python, Hadoop, Azure, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL.
Confidential, Dover, NH
Big Data Engineer
Responsibilities:
- Implemented complete Big Data flow of starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, HBase, Hive .
- Installed and configured and maintain various components of Hadoop ecosystem.
- Installed Hadoop, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Debugging and troubleshooting the issues in development and Test environments. Conducting root cause analysis and resolve production problems and data issues.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Worked on migrating PIG scripts and Map Reduce programs to Spark Data frames API and Spark SQL to improve performance.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS using UDF developing by python.
- Designed both 3NF data models for ODS, OLTP, OLAP systems and dimensional data models using star and snow flake Schemas.
- Ingest data into Hadoop / Hive/HDFS from different data sources.
- Created Hive External tables to stage data and then move the data from Staging to main tables.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Rewrite existing Python module to deliver certain formats of data.
- Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
- Worked with NoSQL database HBase in getting real time data analytics.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as MapReduce Hive, Pig, and Sqoop.
- Finalize the naming Standards for Data Elements and ETL Jobs and create a Data Dictionary for Meta Data Management.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data from different source(database & files) into Hive using Talend tool.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Designed, developed and maintained data integration programs in Hadoop and RDBMS environment with both RDBMS and NoSQL data stores for data access and analysis.
- Used all major ETL transformations to load the tables through Informatica mappings.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
Environment: Hadoop, Cloudera, Talend, HDFS, Hive, Pig, DB2, SQL, Linux, Yarn, NDM, Informatica, AWS, Windows & Microsoft Office.
Confidential, Long beach, CA
Sr. Data Modeler/Analyst
Responsibilities:
- As a Sr. Data Modeler/ Analyst I am responsible for all data related aspects of a project.
- Worked on Software Development Life Cycle (SDLC) with good working knowledge of testing, Agile methodology, disciplines, tasks, resources and scheduling.
- Developed normalized Logical and Physical database models to design OLTP system for Reference and Balance data conformance using ER studio modeling tool.
- Worked with SQL, Python, Oracle PL/SQL , Stored Procedures, Triggers, SQL queries and loading data into Data Warehouse/Data Marts.
- Developed the logical data models and physical data models that capture current state/future state data elements and data flows using ER Studio.
- Involved in preparing the design flow for the Data stage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications.
- Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server with high volume data.
- Delivered dimensional data models using ER/Studio to bring in the Employee and Facilities domain data into the oracle data warehouse.
- Involved in maintaining and updating Metadata Repository with details on the nature and use of applications/data transformations to facilitate impact analysis.
- Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
- Business Data Lineage from Critical Data Elements to DQ Measures to Business Rules mapped on Collibra
- Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata.
- Worked in importing and cleansing of data from various sources like Teradata, Oracle, flatfiles, MS SQL Server with high volume data
- Reverse Engineered DB2 databases and then forward engineered them to Teradata using ER Studio.
- Part of team conducting logical data analysis and data modeling JAD sessions, communicated data-related standards .
- Involved in meetings with SME (subject matter experts) for analyzing the multiple sources.
- Involved in SQL queries and optimizing the queries in Teradata.
- Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
- Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
- Worked extensively on ER Studio for multiple Operations across Atlas Copco in both OLAP and OLTP applications.
- Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
- Worked with the DBA to convert logical Data models to physical Data models for implementation.
Environment: Business Objects, ER Studio, Oracle SQL Developer, SQL Server 2008, Teradata, ER/Studio, SSIS, Windows, MS Excel.
Confidential
Data Modeler/Analyst
Responsibilities:
- Performed as a Data Analysis, Data Modeling, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
- Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
- Generated and reviewed reports to analyze Data using different excel formats.
- Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN
- Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
- Troubleshooting, resolving and escalating Data related issues and validating Data to improve Data quality.
- Participated in testing of procedures and Data, utilizing PL/SQL, to ensure integrity and quality of Data in Data warehouse.
- Report on trends that come up as to identify changes or trouble within the systems using Access and Crystal Reports.
- Maintained Excel workbooks, such as development of pivot tables, exporting Data from external SQL databases, producing reports and updating spreadsheet information.
- Extracted Data from DB2, COBOL Files and converted to Analytic SAS Datasets.
- Performed Data Analysis and extensive Data validation by writing several complex SQL queries.
- Designed and developed use cases, activity diagrams, and sequence diagrams using UML.
- Extensively involved in the modeling and development of Reporting Data Warehousing System
- Designed the database tables & created table and column level constraints using the suggested naming conventions for constraint keys.
- Used ETL tool BO DS to extract, transform and load data into data warehouses from various sources like relational databases, application systems, temp tables, flat files etc.
- Developed stored procedures and triggers.
- Wrote packages, procedures, functions, exceptions using PL/SQL.
- Reviewed the database programming for triggers, exceptions, functions, packages, procedures.
- Involved in the testing phase right from the Unit testing to the User Acceptance testing.
- Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
Environment: Erwin 4, MS Visio, Oracle 10g, SQL Server 2000, Business Object Data Integrator R2