We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Lowell, AR

SUMMARY

  • Overall 6 years of experience as Big Data Engineer/Data Engineer and Data Analysis including designing, developing using Big data & ETL technologies.
  • Pleasant experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizations using SAS and Python and creating dashboards using tools like Tableau.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of HDFS, Hive, Pig, Sqoop, Oozie, HBase and MongoDB.
  • Experience with Client - Server application development using Oracle PL/SQL, SQLPLUS, SQL Developer, TOAD, and SQLLOADER.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
  • Experienced on Python for statistical computing.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop), NoSQL databases like MongoDB, HBase, Cassandra.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Pleasant experience working on analysis tool like Tableau for regression analysis, pie charts and bar graphs.

TECHNICAL SKILLS

Big Data technologies: HBase 1.2, HDFS, Sqoop 1.4, Hadoop 3.0, Hive 2.3, Pig 0.17, Oozie 5.1

Cloud Architecture: Amazon AWS, EC2, EC3, Elastic Search, Elastic Load Balancing & Basic MS Azure

Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe

Operating System: Windows, Unix & Linux.

ETL/ Data warehouse Tools: Informatica 9.6 and Tableau 10, Snowflake.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Sr. Big Data Engineer

Confidential - Lowell, AR

Responsibilities:

  • As a Sr. Big Data Engineer primarily involved in development of Big Data solutions focused on pattern matching and predictive modeling.
  • Objective of this project is to migrate all the services from in-house to cloud (AWS).
  • Built data warehousing solutions on analytics/reporting using AWS Redshift service.
  • Built a data lake as a cloud-based solution in AWS using AmazonS3 and make it a single source of truth.
  • Involved in creating Snow Sql to extract data from S3 buckets to snowflake tables and transforming the data according to business requirements.
  • Migration of data includes various data types like Streaming data, structured data and unstructured data from various sources and also includes legacy data migration.
  • Utilize AWS services with focus on big data analytics, enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility,
  • Designed AWS architecture, Cloud migration, AWS EMR, DynamoDB, Redshift and event processing using lambda function.
  • Built NoSQL solution for non-structural data using AWS DynamoDB services
  • Developed Python programs to consume data from APIs as part of several data extraction processes and store the data in AWS S3.
  • Involved in data migration to snowflake using AWS S3 buckets.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
  • Created Hive External tables to stage data and then move the data from Staging to main tables.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Presto in-built connectors for Redshift and Hive to prepare datasets for applying advanced analytics (ML) on certain use cases.
  • Implemented performance optimizations on the Presto SQL Queries for improving query retrieval times.
  • Used Query execution plans in Presto for tuning the queries that are integrated as data sources for Tableau dashboards.
  • Designed of Redshift Data model and working on the Redshift performance improvements that helps faster query retrieval and also improves the dependent reporting/analytics layers.
  • Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for the certain events based on use cases.

Big Data Engineer

Confidential - Hartford, CT

Responsibilities:

  • As a Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Worked closely with Business Analysts to review the business specifications of the project and also to gather the ETL requirements.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team
  • Used Agile (SCRUM) methodologies for Software Development.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Built a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica 9.5 to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets of Big Machines applications.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Loaded the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Utilized Big Data components like tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.

ETL Data Engineer

Confidential - Denver, CO

Responsibilities:

  • Analyzed Big Data using application solutions such as Hadoop technologies through hands-on projects.
  • Recreated existing SQL Server objects in snowflake.
  • Also converted SQL Server mapping logic to Snow SQL queries.
  • Involved in ETL, Data Integration and Migration using Sqoop to load data into HDFS on a regular basis.
  • Wrote Hive queries for ad hoc data analysis to meet the business requirements.
  • Implemented Partitions, Bucketing concepts to Hive and designed both managed and external tables in Hive to optimize performance.
  • Solved performance issues in Hive scripts with understanding of Joins, Group, and Aggregation and how does it translate to MapReduce jobs.
  • Created Hive tables and worked on them using HiveQL implementing transformation and performing analysis and creating visualization reports.
  • Performed ETL data cleaning, integration, and transformation using Sqoop.
  • Designed a data warehouse using Hive, created managed Hive tables in Hadoop.
  • Worked on analyzing Hadoop cluster and different big data analytic tools using Hive, Sqoop, and Oozie.
  • Developed Oozie workflows for scheduling and orchestrating the ETL process.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
  • Exported the analyzed data of the relational databases using Sqoop for visualization and to generate reports for the R and D team.
  • Managed the data by storing them in tables and created visualization reports.

ETL Data Analyst

Confidential

Responsibilities:

  • Defined and modified standard design patterned ETL frameworks, Data Model standards guidelines and ETL best practices.
  • Designed physical / logical data models based on Star and snowflake schema using Erwin modeler to build an integrated enterprise data warehouse.
  • Coordinating with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of data from multiple sources.
  • Performed detailed data investigation and analysis of known data quality issues in related databases through SQL.
  • Performed data validation, data profiling, data auditing and data cleansing activities to ensure high quality Business Objects report deliveries.
  • Configured sessions for different situations including incremental aggregation, pipe-line partitioning etc.
  • Created effective Test Cases and performed Unit and Integration Testing to ensure the successful execution of data loading process.
  • Created SSIS Packages to export and import data from CSV files, Text files and Excel Spread sheets.
  • Generated periodic reports based on the statistical analysis of the data from various time frame and division using SQL Server Reporting Services (SSRS).
  • Developed different kind of reports such a Sub Reports, Charts, Matrix reports, Linked reports.
  • Analyze the client data and business terms from a data quality and integrity perspective.
  • Worked to ensure high levels of data consistency between diverse source systems including flat files, XML and SQL Database.
  • Developed and run ad hoc data queries from multiple database types to identify system of records, data inconsistencies and data quality issues.
  • Conducted design discussions and meetings to come out with the appropriate datamart using Inmon methodology.
  • Maintained Excel workbooks, such as development of pivot tables, exporting data from external SQL databases, producing reports and updating spreadsheet information.
  • Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets.
  • Work with Quality Improvement, Claims and other operational business owners to ensure appropriate actions take place to address rejections as well as ensure reprocessing of previously rejected data
  • Ensured the quality, consistency, and accuracy of data in a timely, effective and reliable manner.
  • Worked with the Business analyst for gathering requirements.
  • Created SSIS packages to load data into Data Warehouse using Various SSIS Tasks like Execute SQL Task, bulk insert task, data flow task, file system task, send mail task, active script task, xml task and various transformations.
  • Used Sqoop & flume for Data ingestion.
  • Migrating all the programs, Jobs and schedules to Hadoop.
  • Used Erwin for relational database and dimensional data warehouse designs.
  • Conducted and participated in JAD sessions with the users, modelers, and developers for resolving issues.

We'd love your feedback!