Sr. Big Data Engineer Resume
Chicago, IL
SUMMARY:
- Above 9+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Experience in Data transformation, Data Mapping from source to target database schemas, Data Cleansing procedures.
- Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
- Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
- Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, and AWS Redshift.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
- Experienced in working with different scripting technologies like Python, Unix shell scripts.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
- Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - MapReduce framework.
- Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory.
- Experience in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics and data wrangling.
- Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, NetBeans
- Expert in Amazon EMR, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
- Extensive experience in Technical consulting and end-to-end delivery with data modeling, data governance.
- Implemented a distributing messaging queue to integrate with Cassandra using Apache Kafka.
- Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
- Extensive experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
TECHNICAL SKILLS:
Big Data Ecosystem: Map Reduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services(AWS), Amazon Red shift
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Programming Languages: SQL, PL/SQL, Python,UNIX shell Scripting
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source Safe
Operating System: Windows 7/8/10, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Sr. Big Data Engineer
Responsibilities:
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Worked in exporting data from Hive tables into Netezza database.
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used AWS Cloud with Infrastructure Provisioning / Configuration.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFSand processed.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Implemented partitioning, dynamic partitions and buckets in Hive.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Spark, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8.
Confidential, Richmond, VA
Data Engineer
Responsibilities:
- Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.
- Participated in JAD meetings to gather the requirements and understand the End Users System.
- Used SDLC Methodology of Data Warehouse development using Kanbanize.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
- Experience in data cleansing and data mining.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
- Ingest data into Hadoop / Hive/HDFS from different data sources.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
- Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web-services through SOAP Lite module and WSDL.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Design of Redshift Data model, Redshift Performance improvements/analysis
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Performed File system management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Developed customized classes for serialization and Deserialization in Hadoop.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, PostgreSQL Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.
Confidential, Newport Beach, CA
Data Analyst/Engineer
Responsibilities:
- Worked as a Data Modeler to generate Data Models using Erwin and developed relational database system.
- Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
- Developed detailed ER diagram and data flow diagram using modeling tools following the SDLC structure.
- Developed ETLs to pull data from various sources and transform it for reporting applications using PL/SQL
- Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Handled performance requirements for databases in OLTP and OLAP models.
- Translated logical data models into physical database models, generated DDLs for DBAs
- Involved in Normalization/Demoralization techniques for optimum performance in relational and dimensional database environments.
- Used SAS to build models to identify definite patterns and suggest business with possible problems and feasible solutions.
- Created reports using either Tableau based client needs for dynamic interactions with the data produced.
- Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
- Created UML based diagrams such as Activity diagrams using MS Visio.
- Performed analysis on existing data model to understand the methodology of model and help QlikView developers understand the requirements.
- Conducted analysis, gather requirements, develops Use Cases, data mapping, and workflow diagrams.
- Identified and document limitations in data quality that jeopardize the ability of internal and external data analysis.
- Used Data warehousing for Data Profiling to examine the data available in an existing database and created Data Mart.
- Performed streaming data ingestion to the Spark distribution environment, using Kafka.
- Extracted data from various relational databases and performed SQL queries depending on how the data needs to be modified.
- Created 3NF business area data modeling with de-normalized physical implementation; data and information requirements analysis.
- Worked with Big Data eco system covering HDFS, HBase, YARN and MapReduce.
- Worked with data warehouse concepts for data cleaning, data integration, data transformation and also a periodic data of refreshing.
- Created Dashboards using Tableau, showcasing the trend the client is following
- Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
Environment: Erwin 9.4, PL/SQL, SSAS, SSRS, OLTP, OLAP, MS Visio 2014, Kafka, SQL, 3NF, HDFS, HBase 2.3, MapReduce, T-SQL
Confidential, Des Plaines, IL
Data Modeler/Data Analyst
Responsibilities:
- Translated business and data requirements into as a Data Modeler/Data Analyst I was responsible for all data related aspects of a project.
- Conducted JAD sessions, wrote meeting minutes and also documented the requirements.
- Worked with Business users for requirements gathering, business analysis and project coordination.
- Created conceptual, logical and physical models based on requirements gathered through interviews with the business users.
- Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, OLAP, DW).
- Demonstrated Qlikview data analyst to create custom reports, charts and bookmarks.
- Performed Data Profiling, Data Cleansing, De-duplicating the data and has a good knowledge on best practices.
- Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
- Developed process methodology for the Reverse Engineering phase of the project.
- Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle.
- Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
- Worked with business analyst to design weekly reports using combination of Crystal Reports.
- Extracted data from databases like Oracle, SQL server and DB2 using Informatica to load it into a single repository for data analysis.
- Understood existing data model and documented suspected design affecting the performance of the system
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Developed Star and Snowflake schemas based dimensional model growing the data warehouse
- Participated in the tasks of knowledge migration from legacy to new database system
- Maintained of large data sets, combining data from various sources in varying formats to create SAS data sets.
- Utilized SSRS and Cognos in creating and managing reports for an organization.
- Conducted one-on-one sessions with business users to gather warehouse requirements.
- Created Fast Export, Multi Load, Fast Load UNIX script files for batch Processing.
- Maintaining and implementing Data Models for Enterprise Data Warehouse using Erwin.
- Extensive experience in PL/SQL programming Stored Procedures, Functions, Packages and Triggers
Environment: SQL, OLAP, OLTP, Qlikview 10, ETL, Crystal Reports, Oracle 10g, Tableau, UNIX, SSRS SAS PL/SQL.
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analyst for requirements gathering, business analysis and project coordination.
- Performed migration of Reports (Crystal Reports, and Excel) from one domain to another domain using Import/Export Wizard.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
- Used advanced Excel formulas and functions like Pivot Tables, Lookup, If with and/index, match for data cleaning.
- Redesigned some of the previous models by adding some new entities and attributes as per the business requirements.
- Reviewed Stored Procedures for reports and wrote test queries against the source system (SQL Server) to match the results with the actual report against the Data mart (Oracle).
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Performed SQL validation to verify the data extracts integrity and record counts in the database tables
- Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
- Effectively used data blending feature in Tableau to connect different databases like Oracle, MS SQL Server.
- Transferred data with SAS/Access from the databases MS Access, Oracle into SAS data sets on Windows and UNIX.
- Provided guidance and insight on data visualization and dashboard design best practices in Tableau
- Performed Verification, Validation and Transformations on the Input data (Text files) before loading into target database.
- Executed data extraction programs/data profiling and analyzing data for accuracy and quality.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
- Documented designs and Transformation Rules engine for use of all the designers across the project.
- Designed and implemented basic SQL queries for testing and report/data validation
- Used ad hoc queries for querying and analyzing the data.
- Performed Gap Analysis to check the compatibility of the existing system infrastructure with the new business requirements.
Environment: SQL, PL/SQL, Oracle9i, SAS, Business Objects, Tableau, Crystal Reports, T-SQL, SAS, UNIX, MS Access 2010
