Sr. Data Engineer/hadoop Developer Resume
New, YorK
SUMMARY
- Over 8+ Years of Experience in Data Analytics/Engineering domainwith Data Mining and visualization using Python, Excel, Tableau, SQL - PL/SQL.
- Well-versed in regions such as processing data, communications and change development, data-research, trafficking, and information management.
- In-depth know-how in SQL Server, Hadoop
- Experience in Big-Data Analytics using SQL, Alteryx.
- Implemented extensive data visualization using Tableau, SAP Business Objects, Excel.
- Self-motivated leader and team builder, consistently motivating others for success.
- Ability and strong willingness to use data to solve large-scale ambiguous problems.
- Innovative Techniques to reach accomplishments.
- Worked Cross Functionally to create and communicate key insights and influence key decisions and offer recommendations to leadership.
- Actively contributed to cross-functional project teams that included ETL building, Data Quality validation and assurance, and Data visualization.
- Well-Versed with AGILE Methodologies.
- Analyze the Data/App logs and flows to monitor the performance.
TECHNICAL SKILLS
Hadoop Framework: HDFS, Map Reduce
Hadoop Eco-System: Hive, NiFi, Sqoop HBase, Spark, SQL, Spark Streaming, zookeeper, YARN, Kafka, Flume, Ambari, PIG
NoSQL Databases: HBase
Languages: Scala, Python, HiveQL, Unix shell scripts, Java
ETL: Informatica, NiFi, Alteryx, SSIS
Real-Time Messaging Systems: Kafka, Kinesis Streaming, Flume
Operating Systems: macOS, UNIX, Windows XP/Vista/7/8,10
Databases: Oracle, DB2, SQL Server, MySQL, Teradata
IDE: IntelliJ, Eclipse
Methodologies: Agile, Waterfall
Amazon Web Services (AWS): EC2, EMR, S3, Snowflake, CloudWatch, SNS, Lambda, DynamoDB, Redshift, Elastic Search, Athena, Kinesis
MS Azure: Azure Data Bricks, Azure Data Factory
IBM Cloud: Event Streams (Kafka), Kubernetes, COS, Kibana, LogDNA,Sysdig
Scheduling Tools: Airflow, TWS, Control-M, Autosys
Build Tools: SBT, Maven
Ticketing Tools: JIRA, Service Now, BMC
PROFESSIONAL EXPERIENCE
Confidential, New York
Sr. Data Engineer/Hadoop Developer
Responsibilities:
- Developing Analytical Modules for different buyers within Randstad and integrating it with DOMO using Python and Google Cloud Platform.
- Actively participating in Requirement gathering from different teams to accomplish the targets.
- Creating Spark applications to write data into Snowflake DB (reporting data). Creating secure views in Snowflake DB for people to query the data using DOMO.
- Experience with implementing row and column level security in Snowflake. Masking using External tokenization for columns within a table or view.
- Developed End-to-End data pipeline connecting to Azure Data Lake through heavy SQL procedures along side End to end Alteryx Workflow as and when required for specific MSP customers.
- Created workflows in Python for spark jobs. Scheduled spark data ingestion jobs using Celery executor in Airflow.
- Implementing Dynamic Data Mapping with live python data scripting such as Pydomo and Data integrators
- Developing backend procedures within SSIS to develop procedures for end-to-end data loading, analysis and visualization.
Environment: MS SQL Server, Alteryx, Tableau, Excel, Hive, Teradata, Informatica, AWS Lambda, SAS, HDFS, Hive, Impala
Confidential, New Jersey
Sr. Hadoop Developer / Data Engineer
Responsibilities:
- Developing EOD Analytical Inventory Reporting along with the OnHand summary for the Business Leadership to take decisions.
- Actively participated in ETL process-centric requirement gathering that ETL processes need to accomplish the targets.
- Worked on POC on HBase, flume, Yarn and Hive on spark for a different project requirement.
- Performed data import from various data sources, transformations using HIVE and SparkSQL
- Involved in complete lifecycle Hadoop Implementation project specializing in but not limited to, write Hive Queries (HQL) and Sqoop.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, Spark, Pig and loaded data into HDFS.
- Developed End-to-End data pipeline connecting to Azure Data Lake through heavy SQL procedures along side End to end Alteryx Workflow as and when required for specific wireline customers.
- Implemented Dynamic Data Mapping with Live ETL Tool such as Informatica and scheduled data centric Job For the same.
- Used File System Check (FSCK) to check the health of files in HDFS. Developed job flows in Oozie to automate the workflow of hive jobs.
- Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS.
- Experience in working with different data sources like Flat files, XML files, JSON files, ORC files, Parquet files and AVRO files using different SerDe's in Hive.
- Strong Data Warehousing ETL experience of using Informatica 9.1/8.6.1/8.5/8.1 PowerCenter Client tools - Mapping Designer, Repository manager, Workflow Manager/Monitor and Server tools Informatica Server, Repository Server manager.
- End to End experience on across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing.
- Using Alteryx to build and develop analytical Workflows to extract the data, Transform, process and build business goal-oriented reports.
- Develop, execute, monitor SQL Server jobs to accomplish table data efficiencies along with performance tuning.
- Continuous Monitoring of the Workflows to proactively identify any performance issues and fixing them before-hand for smooth data delivery.
- Analyzing the hidden data to identify data/structure discrepancies and fix them using fetch-Analyze-clean-load concept within an Alteryx Workflow.
- Tested and Validated Real-Time ETL Work-Flows and scheduled them in app.
- Developed Batch ETL Workflows using SQL Queries and Alteryx logic and target the refined data to a specific schema.
- Modeled data and derived meaningful visualizations using Tableau, Excel, Looker.
- Experience in Hadoop Ecosystem including Hive, Impala, Pig, HBase, Oozie, Sqoop, HCatalog, HUE.
- Implemented Dynamic Data Visualizations on a deeper level of Granularity using Tableau and SAS.
- Using Tableau to Analyze and Visualize Supply Chain Activations Data
Environment: MS SQL Server, Alteryx, Tableau, Excel, Hive, Teradata, Informatica, AWS Lambda, SAS, HDFS, Hive, Impala
Confidential, Sunnyvale, CA
Hadoop Developer/Data Engineer
Responsibilities:
- Worked on a Data Application managing extensive amounts of data for the Finance Dept.
- Created Hive external tables to perform ETL on data that is generated on daily basis.
- Actively participated in ETL process-centric requirement gathering that ETL processes need to accomplish the targets.
- Supported the Data Application system in Analyzing, Monitoring, and generating Data-Centric Reports for Business Decisions
- Used Alteryx to Load, clean, analyze, predict and generate the required report for Business and Leadership decisions.
- Worked with Operations and Product teams to minimize incident impact on our business.
- Worked with SQL and PL/SQL to manage and Analyze large amounts of data.
- Continuous Monitoring of the Workflows to proactively identify any performance issues and fixing them before-hand for smooth data delivery.
- Created ETL Flows to help support the R&D Teams to take up data-centric decisions.
- Being a constant support to the Audit team focused on an accurate audit execution.
- Analyzed the hidden data for potential discrepancies, investigate errors, and perform data cleanup.
- Modeled data and derived meaningful visualizations using Tableau.
- Using Spark-Streaming APIs to perform transformations and actions which gets the data from Kafka in near real-time and persists into HBase.
- Hands on experience in setting up Impala and created tables and develop scripts in HUE for POC purposes.
- Used spark SQL to process the data from hive tables for faster processing of analytical data.
- Performed ETL mass and batch loads via SQL*Loader control files and loader jobs for refreshing data from mainframe source and created technical documentation for users to understand the data flow.
Environment: Alteryx, Oracle PL/SQL, SQL, Numbers, Tableau, Impala, HUE
Confidential, Grapevine, TX
Hadoop Developer/Data Engineer
Responsibilities:
- Possessing master knowledge of operating systems, devices, applications, and software
- Created Hive external tables to perform ETL on data that is generated on daily basis.
- Dynamically Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse
- Dynamically modified existing mappings for enhancements of new business requirements
- Prepared mapping documents to outline data flow from sources to targets.
- Used Alteryx to profile and analyze the source system data to identify the edge cases and quality issues along with communicating with the required audience.
- Implemented Data Parsing of high-level design specification to simple ETL coding and mapping standards
- Involved in the building of ETL architecture and Source to Target mapping to load data into Data warehouse
- Helped with technical support to teams within the organization, and to external clients as and when required.
- Developed Batch ETL Workflows using SQL Queries and Alteryx logic and target the refined data to a specific schema.
- Developed End-to-End data pipeline connecting to Azure Data Lake through heavy SQL procedures along side End to end Alteryx Workflow as and when required for specific MSP customers.
- Implemented data mappings to load into staging tables and then to Dimensions and Facts.
- Used spark SQL to process the data from hive tables for faster processing of analytical data.
- Generated Analytical reports and developing and monitoring Business Oriented Procedures with scheduled Tuning for handling adequate amounts of data.
- Performed QA of new data feeds or views required for repeatable tools.
- Developed Python and SQL Scripts to track sales and perform sales analysis.
- End to End experience on across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing.
- Maintained ticketed query system and ensuring comprehensive database of queries and resolutions is kept up to date.
- End to End experience on across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping using Informatica, build, unit testing, systems integration and user acceptance testing.
- Analyzed Sales Data and derived meaningful insights and sent it across to the technical team.
- Regularly maintained and updated technical documents and procedures.
- Detected and resolved technical issues.
- Provided consistent and customized training to teams within the business.
- Building reports for teams across the business.
- Cleansed the data and implemented improvised decision-making using Alteryx.
- Managed upgrading and downgrading of AWS Resources.
- Performed Strong understanding of advanced Tableau features including calculated fields, parameters, table calculations, row-level security, R integration, joins, data blending, and dashboard actions.
- Performed deep GAP analysis in the project as there were numerous 'As-Is' and 'To-Be' conditions using Tableau.
Environment: Alteryx, Python, Django, Tableau, SQL, Excel, MS Office, AWS, Informatica, MS AZURE DataLake
Confidential, Arlington, TX
Hadoop Developer/Data Engineer
Responsibilities:
- Involved in gathering Data From different Legit sources and Cleaning Them.
- Developed mapping documents to outline data flow from sources to targets.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, Spark, Pig and loaded data into HDFS.
- Used Alteryx to profile and analyze the source system data to identify the edge cases and quality issues along with communicating with the required audience.
- Developed mapplets to use them in different mappings.
- Developed Batch ETL Workflows using SQL Queries and Alteryx logic and target the refined data to a specific schema.
- Developed End-to-End data pipeline connecting to Azure Data Lake through heavy SQL procedures alongsideEnd-to-end Alteryx Workflow as and when required for specific wireline customers.
- Used spark SQL to process the data from hive tables for faster processing of analytical data.
- Created different transformations for loading the data into SQL.
- Used Tools like Tableau to Visualize the results.
- Performed QA of new data feeds or views required for repeatable tools.
- Created and scripted SQL code to create and maintain custom database schema for multiple application that utilized multiple product tables.
- Defined reporting approaches, best practices, performance tuning and development.
- Implemented adequate Data Preparation for proactive Analytics Using Alteryx.
- Implemented appropriate load balancing using AWS.
- Met regularly with various Audit Stakeholders and manage project timelines and milestones.
Environment: Alteryx, SQL, ORACLE PL/SQL, Tableau, Excel, MS Office, AWS, Informatica, HIVE, Map Reduce, Spark, Pig, Azure Data Lake
Confidential
Hadoop Developer/Data Engineer
Responsibilities:
- Responsible for collection of data from different sources and cleaning Them.
- Created Hive external tables to perform ETL on data that is generated on daily basis.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
- Developed End-to-End data pipeline connecting to Azure Data Lake through heavy SQL procedures alongside End-to-end Alteryx Workflow as and when required for specific Telecom customers.
- Used Alteryx to profile and analyze the source system data to identify the edge cases and quality issues along with communicating with the required audience.
- Parsed high-level design specification to simple ETL coding and mapping standards.
- Used Python programmable logic along with Alteryx to analyze the data and to build business decision-oriented reports.
- Developed Batch ETL Workflows using SQL Queries and Alteryx logic and target the refined data to a specific schema.
- Built Advanced Charts using Tableau Desktop like Box and Whisker plots, Bullet charts, Waterfall charts for measuring the Sales growth by different departments within the company.
- Implemented different data structure logic and transformations for loading the data into SQL.
- Applied differential Axis in Tableau to Visualize the results in different angles and report the results.
- Created and scripted SQL code to create and maintain custom database schema for multiple application that utilized multiple product tables.
- Applied reporting approaches, best practices, performance tuning and development.
- Actively worked on solving performance-issues and limit queries to the workbooks that when it connects to live database by using a data extract option in Tableau.
- Implemented appropriate load balancing using AWS.
- Participated regularly with various Audit Stakeholders and manage project timelines and milestones.
Environment: Alteryx, SQL, ORACLE PL/SQL, Tableau, Excel, MS Office. Hive, SQOOP
