We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

OH

SUMMARY

  • 5+ years of IT industry experience related to Data Engineering, Data science and Full Stack Development.
  • Experienced in working with Google Cloud Platform, Hadoop distributions like Cloudera and Hortonworks
  • Hands on experience on major components in Hadoop ecosystem like HDFS, Hive, Spark, Sqoop, Oozie and Yarn.
  • Experienced in importing, exporting data from RDBMS into HDFS / Hive and .
  • Good working knowledge on creating / maintaining Hive tables, partitions, bucketing and also written Spark SQL for data analysis and processing to meet business requirements.
  • Experienced working in Python Full Stack with Flask Framework.
  • Develop and monitor IBM DataStage Jobs using various Processing and Debug Stages.
  • Experienced in creating data quality rules in Ataccamma to ensure ETL process in IBM Data Stage.
  • Deploy, test python flask application in Azure DevOps.
  • Developed terraform code to create G - Cloud bucket objects/ Big Query tables.
  • Hands on experience in working with machine learning algorithms such as logistic regression, random forest, linear regression, and K-means.
  • Highly skilled in using visualization tools like Tableau, R and Spotfire.
  • Highly motivated with a strong sense of achievement and the willingness to learn and adapt to new technologies
  • Strong team player;Ability to quickly triage and troubleshoot complex problems.

TECHNICAL SKILLS

Big Data Stacks: Hadoop stack (Hive, Spark Yarn, Sqoop, Oozie), NoSQL(Cassandra HBase)

Languages: C, C++, Java, Python, JS, HTML, XML, PHP, R, Matlab, SQL

Cloud Platforms: Azure DevOps and Google Cloud-Storage, Pubsub, BigQuery

Operating Systems: Windows, UNIX / Linux

Database: MySql, Oracle 9i/10g, SQL Server 2009, MariaDB, IBM DB2

ETL Tools: IBM DataStage 11.5

Data Quality Tool: Ataccama 12.5

Data Design / Modeling Tools: Erwin, MS Visio

Data Science Tools: KNIME, Alteryx, Jupyter

Visualization Tools: R, Tableau, Spotfire

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, OH

Responsibilities:

  • Gather Business Requirements, create Documentation, and analyze Data in IBM DB2.
  • Develop and monitor IBM DataStage Jobs using various Processing stages such as Transformer, Aggregator, Filter, Lookup, Remove Duplicate, Merge, Copy, Join, Sort, Debug etc. and Development/Debug stages such as Row generator stage, Column generator stage, Head-Tail stage, Peek stage.
  • Migrate commercial customer data from DB2 to Salesforce nCino using DataStage 11.5
  • Debug, test, fix data transformation process in various stages for parallel jobs.
  • Create Data Quality rules in Ataccama IDE.
  • Strong SQL skills including complex query building and query performance tuning.
  • Work on Production deployment and support process.

Environment: IBM DataStage 11.5, Python3.7, Ataccama v12.5, AQT, IBM DB2, Alation, MS Visio

Full Stack Data Engineer

Confidential, GA

Responsibilities:

  • Responsible for developing and automating the network engineering tool-eROM.
  • Responsible for requirement gathering, developing, deploying, testing, managing one complete project.
  • Creating front end forms and graphs in javascript which helps user to enter input and review the data.
  • Implement flask webservice to connect front end and back end, and perform mathematical calculations, and transformations
  • Load data into Maria DB using python scripts.
  • Deploy, test app in Azure DevOps Dev, Test, Prod environments.

Environment: Javascript, Python3.7, Anaconda, Python Flask, MariaDB, Redhat-7.

Data Engineer / Analyst

Confidential, OH

Responsibilities:

  • Responsible for developing infrastructure for Google Cloud Platform and participated in cloud architecture meetings.
  • Developed Python code to send bucket/table notifications via pubsub.
  • Loaded SupplyChain Data every day on incremental basis to BIGQUERY using Google DataProc, GCS bucket, HIVE, Python, Gsutil.
  • Performed exploratory data analysis on large set of data Confidential rest in Hadoop to build a curated data layer to perform data science activities.
  • Contributed to all stages of data science or decision modeling projects, including problem formulation, solution development and deployment.
  • Worked with business teams to translate business-relevant scientific, engineering and commercial problems in to questions that may be addressed using data science.
  • Well versed in using one more of the following software packages: scikit-learn,numpy,pandas,Jupyter,matplotlib,scipy and keras.
  • Experienced in solving problems using one or more of the following techniques: Regression,Decision trees, random forest, Boosting, PCA, KMeans.

Environment: GCP, Cloudera Hadoop, Big Query, PubSub, Hue, Python, YAML, Spark

Data Engineer

Confidential, TX

Responsibilities:

  • Responsible for developing schemas for drilling data in NoSQL Database.
  • Developed scripts to ingest data from external systems to Hadoop.
  • Responsible for developing data pipelines using StreamSets.
  • Ingested data from WellView server to hdfs storage.
  • Batch loading of various drilling data types into a big data store.
  • Used GIT repository for tracking changes and coordinating work in the team.
  • Parsed, enriched, ingested, and tested the quality of drilling data using Python and Apache Spark.
  • Develop data process pipelines and machine learning algorithms(e.g., regression, random forest) to predict dysfucntions.
  • Used various Python libraries(matplotlib, plotly, and dash) to visualize data.
  • Supported ad hoc query and extract requests from other teams.
  • Vizualized data on TIBCO Spotfire.

Java AEM Developer

Confidential

Responsibilities:

  • Develop and Design website for Insurance domain using Adobe AEM.
  • Good experience on JAVA, JSP, CRX, JCR, Felix, OSGi and other technologies involved in deploying solutions based on the Adobe AEM framework.
  • Used CRXDE and Brackets for components & templates development, and eclipse Mars for java based implementations.
  • Coordinated and implemented with third party vendors for web chat component using AJAX and Restful services.
  • Performed unit testing on various Jira tickets and components.
  • Worked on version migration issues from CQ5.5 to AEM6.1.
  • Execute process of object model designing, implementation and unit testing.
  • Environment: Azure, Hadoop, Hive, Implala, HDFS, Sypder, Jupiter, Apache Spark, Cassandra, Python, StreamSets, Cent OS, Tibco Spotfire, Microsoft VSTS, GIT, AdobeCQ5.x/AEM6.x, JSP, JCR, CRXDE, DAM, OSGI, HTML, CSS, JavaScript, Eclipse, AngularJS, JVM1.8, Maven, Apache Tomcat, JIRA.

Software Developer

Confidential, TX

Responsibilities:

  • Implement machine learning algorithms in R, building natural language processing systems.
  • Collect, track, and integrate multiple sources of bigdata.
  • Maintain SQL scripts to create and populate tables in data warehouse for daily reporting.
  • Experience in using statistical modeling and/or machine learning techniques to build models.
  • Construct different supervised machine learning models
  • Logistic Regression,Support Vector Machine, K Nearest Neighbors, etc. in R and Java.
  • Work with business teams to create Hive queries for ad hoc analysis.
  • Publish blog posts to promote the company's analytics platform.
  • Evaluate the performance of various algorithms/models/strategies based on the real-worlddatasets.
  • Use analytical tools and regression analysis to create predictive models.
  • Use shiny dashboard, dygraphs, and plotly to develop professional-quality interfaces for data interaction.
  • Environments and Platforms - R, Java, Oracle 10g, Tableau, SAS, Hive

We'd love your feedback!