Data Engineer Resume
FL
SUMMARY
- Over 6 years of professional IT experience in requirement gathering, design, development, testing, implementation and maintenance. Progressive experience in all phases of teh iterative Software Development Life Cycle (SDLC).
- Deep expertise inAnalysis, Design, DevelopmentandTestingphases of Enterprise Data Warehousing solutions.
- More TEMPTEMPthan 5years of experience in Hadoop/Big Data technologies such as in Hadoop, Tera data, Hive, HBase, Oozie, Zookeeper, Sqoop, Kafka, Avro, Impala, Vertica and Spark wif hands on experience in writing MapReduce/YARN and Spark/Scala jobs.
- Around 3 years of experience in developing OLAP reports and dashboards in Tableau Desktop 8.x/9.x, Tableau Server, Business Objects Desktop Intelligence, Web Intelligence, Universe Designer, Crystal Reports and Central Management Console.
- Excellent understanding and noledge of NOSQL databases like HB ase, Cassandra, MongoDB
- Experience in migrating teh data using Sqoop from Hadoop to Relational Database System and vice - versa.
- Extensive work experience in ETL processes consisting of data sourcing, data transformation, mapping and loading of data from multiple source systems into Data Warehouse using Informatica Power Center.
- Involved in building all domain pipelines usingSparkData Frames,Sparksteaming and Spark batchprocessing.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs and Scala
- Knowledge of Teradata BTEQ, Fast Load, Fast Export and MLOAD scripts.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in migrating teh data using Sqoop from Hadoop to Relational Database System and vice-versa.
- Expertise in Java/J2EE Technologies such as Core Java, Spring, Hibernate, JDBC, JSON, HTML, spring, Servlets, JSP, JBOSS AndJavascript.
- Good middleware skills in J2EE, web services wif application servers - Tomcat WebServer, BEA Weblogic, IBM Websphere, Jboss wif experience on heterogeneous operatingsystems
- Proficient in SQL, PL/SQL programming skills like Triggers, Stored Procedures, Functions, Packagesetc. in developingapplications
- Hands on experience in VPN, Putty, winSCP, VNCviewer etc.
- Experience in writing Down-Stream and Up-Stream Pipelines usingPython.
- Involved in designing application system requirements and coded back-end inPython
- Good Experience in writing complex SQL queries wif databases like DB2, Oracle 11g, MySQL and SQL Server
- Has Experience of using integrated development environment likeEclipse,Net beans,JDeveloper,My Eclipse.
- Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Unix operating system.
- Created various views in Tableau like Tree Maps, Heat Maps, Scatter Plots, Geographic Maps, Line Chart, Pie Charts andetc.
- Worked in Agile methodology of software development process as a Scrum Master.
PROFESSIONAL EXPERIENCE
Confidential, FL
Data Engineer
Responsibilities:
- Created data ingestion pipelines using Python for loading data into HDFS from MSSQL, MySQL, Oracle and vice-versa.
- Monitored job progress in Chronos.
- Developed visualizations on UNIDASH to visually update and report teh outputs.
- Worked on data extraction strategies using Facebook’s Lineage.
- Worked on co-ordination services through Dataswarm.
- Created data ingestion pipelinefor ETL
- Involved into testing and migration to Presto and Spark.
- Created lite version of important securities tables to meet SLA.
- Added DQ Operator to different pipelines for proper data quality check.
- Involved in UHAUL for tables between different Namespace.
- Used Microsoft Excel to data analysis.
Environment: HDFS, Informatica, Presto, HIVE, Python, Oracle 11g, MS-SQL, MySQL
Confidential, Fremont, CA
Data Engineer
Responsibilities:
- Created python file for loading data into HDFS from Oracle and vice-versa on Airflow framework.
- Used Spark on Hive to process large amount of user’s banking data.
- Created data-lake form different heterogeneous sources.
- Migrated Informatica jobs wif Hadoop jobs to process and migration of data.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs and Scala
Environment: HDFS, Informatica, HIVE, Python, Java, Oracle 11g, Spark, IHub, Airflow
Confidential - Alexandria, VA.
Data Engineer
Responsibilities:
- Created data ingestion pipelines using Python to load data into HDFS from Vertica, MySQL, Oracle and vice-versa.
- Monitored job progress in Chronos.
- Configureddata swarm for Presto to process these large data sets.
- Developed visualizations on Tableau to visually update and report teh outputs.
- Configured workflows to run on teh top of Hadoop and these workflows comprises of heterogeneous jobs like Presto, Hive and Vertica.
- Developed job flows in Dataswarm to automate teh workflow for extraction of data from warehouses.
- Worked on data extraction strategies using Facebook’s Lineage.
- Worked on co-ordination services through Dataswarm.
- Automated productivity reports to ensure teh daily and weekly revenue goal were completed accurately wifin established deadline.
- Created data ingestion pipelinefor ETL
- Involved into testing and migration to Presto.
- Involved into backfill of data from DW01 and DW02 to Hive, LW01 and LW03
- Created lite version of important facts tables to meet SLA.
- Involved in UHAUL for tables between different Namespace.
- Used Microsoft Excel to revenue analysis.
- Actively took ownership of RDF during On-call
Environment: Hadoop2x, HDFS, Tableau, Vertica, Informatica, Presto, Map Reduce, HIVE, Python, Oracle 11g
Confidential - Bloomington, IL
Data Engineer
Responsibilities:
- Implemented CDH3Hadoopcluster on Red hat LINUX.
- Involved in Data Warehousing and worked extensively on RDBMS (Oracle, MS SQL), ETL - Informatica and Oracle DRM (Data Relationship Management), and Reporting (BIDS - Visual Studio, and Oracle Reports).
- Provided solution using ETL such as SSIS.
- Facilitated noledge transfer sessions for new resources.
- Developed Oozie workflow for scheduling and orchestrating teh ETL process.
- Installed and configured Hadoop Map reduce, HDFS, Developed multiple Map Reduce jobs in java/Python for data cleaning and preprocessing.
- Worked on a live 200 in pre-prod and 400 in production nodesHadoopcluster running Cloudera.
- Worked wif semi structured data of TB in size.
- Gave extensive presentations about theHadoopecosystem, best practices, data architecture in Hadoop.
- Experience in defining job flows.
- Used AWS S3, Redshift storage for data analysis.
- Experienced wif Spark using Scala and Python.
- Experience in running Hadoop streaming jobs to process terabytes of xml format data.
- Supported Map Reduce Programs those are running on teh cluster.
- Involved in loading data from UNIX file system to HDFS.
- Executed queries using Hive and developed MapReduce jobs to analyze data.
- Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform. This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
Environment: ETL, Hadoop, Hive, HBase, MapReduce, HDFS, Pig, Cassandra, Strom, Flume, Hbase, MapReduce, IBM DataStage 8.1, Oracle 11g / 10g, PL/SQL, SQL*PLUS, LINUX, UNIX Shell Scripting, Java, Python
Confidential
SQL Developer
Responsibilities:
- Worked wif application developers and production teams across functional units to identify business needs and discuss solution options.
- Designed and implemented SSIS packages to migrate data from heterogeneous data sources to staging database and load teh data to data warehouse or data marts.
- Created T-SQL queries, complex Stored Procedures, User-defined Functions, designing and implementing of Database Triggers (DDL, DML), Views and designing and implementing Indexes.
- Tuning SQL queries, Maintaining Data Integrity and Data Consistency, Performance Tuning and Query Optimization.
- Worked wif subject matter experts (SME's) and project team to identify, define, collate, document and communicate teh data migration requirements
- Generated ad-hoc reports in Excel Power Pivot and sheared them using Power BI to teh decision makers for strategic planning.
- Utilized Power Query in Power BI to Pivot and Un-pivot teh data model for data cleansing and data massaging.
- Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
- Utilized Power BI gateway to keep dashboards and reports up to-date wif on premise data sources.
- Implemented code check-in/check-out and managed multiple versions of complicated code wifin TFS.
- Provisioned VMs wif SQL Server on cloud utilizing Microsoft Azure and setting up communication wif teh help of endpoints.
- Migrated enterprise database to Microsoft Azure wif RedGate and azure data factory.
- Helped to develop backup and recovery strategy for databases on virtualization platform utilizing Microsoft Azure
Environment: Windows server 2003, IIS 6.0, MS SQL Server- 2014/2012/2010/2008/2005 Enterprise Edition, SSIS, SSAS, SSRS, T-SQL, DTS, SQL Profiler, OLAP, OLTP, Erwin, Performance Point Sever, Office-2007, MS Excel