Data Engineer Resume
OH
SUMMARY
- 5+ years of IT industry experience related to Data Engineering, Data science and Full Stack Development.
- Experienced in working with Google Cloud Platform, Hadoop distributions like Cloudera and Hortonworks
- Hands on experience on major components in Hadoop ecosystem like HDFS, Hive, Spark, Sqoop, Oozie and Yarn.
- Experienced in importing, exporting data from RDBMS into HDFS / Hive and .
- Good working knowledge on creating / maintaining Hive tables, partitions, bucketing and also written Spark SQL for data analysis and processing to meet business requirements.
- Experienced working in Python Full Stack with Flask Framework.
- Develop and monitor IBM DataStage Jobs using various Processing and Debug Stages.
- Experienced in creating data quality rules in Ataccamma to ensure ETL process in IBM Data Stage.
- Deploy, test python flask application in Azure DevOps.
- Developed terraform code to create G - Cloud bucket objects/ Big Query tables.
- Hands on experience in working with machine learning algorithms such as logistic regression, random forest, linear regression, and K-means.
- Highly skilled in using visualization tools like Tableau, R and Spotfire.
- Highly motivated with a strong sense of achievement and the willingness to learn and adapt to new technologies
- Strong team player;Ability to quickly triage and troubleshoot complex problems.
TECHNICAL SKILLS
Big Data Stacks: Hadoop stack (Hive, Spark Yarn, Sqoop, Oozie), NoSQL(Cassandra HBase)
Languages: C, C++, Java, Python, JS, HTML, XML, PHP, R, Matlab, SQL
Cloud Platforms: Azure DevOps and Google Cloud-Storage, Pubsub, BigQuery
Operating Systems: Windows, UNIX / Linux
Database: MySql, Oracle 9i/10g, SQL Server 2009, MariaDB, IBM DB2
ETL Tools: IBM DataStage 11.5
Data Quality Tool: Ataccama 12.5
Data Design / Modeling Tools: Erwin, MS Visio
Data Science Tools: KNIME, Alteryx, Jupyter
Visualization Tools: R, Tableau, Spotfire
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential, OH
Responsibilities:
- Gather Business Requirements, create Documentation, and analyze Data in IBM DB2.
- Develop and monitor IBM DataStage Jobs using various Processing stages such as Transformer, Aggregator, Filter, Lookup, Remove Duplicate, Merge, Copy, Join, Sort, Debug etc. and Development/Debug stages such as Row generator stage, Column generator stage, Head-Tail stage, Peek stage.
- Migrate commercial customer data from DB2 to Salesforce nCino using DataStage 11.5
- Debug, test, fix data transformation process in various stages for parallel jobs.
- Create Data Quality rules in Ataccama IDE.
- Strong SQL skills including complex query building and query performance tuning.
- Work on Production deployment and support process.
Environment: IBM DataStage 11.5, Python3.7, Ataccama v12.5, AQT, IBM DB2, Alation, MS Visio
Full Stack Data Engineer
Confidential, GA
Responsibilities:
- Responsible for developing and automating the network engineering tool-eROM.
- Responsible for requirement gathering, developing, deploying, testing, managing one complete project.
- Creating front end forms and graphs in javascript which helps user to enter input and review the data.
- Implement flask webservice to connect front end and back end, and perform mathematical calculations, and transformations
- Load data into Maria DB using python scripts.
- Deploy, test app in Azure DevOps Dev, Test, Prod environments.
Environment: Javascript, Python3.7, Anaconda, Python Flask, MariaDB, Redhat-7.
Data Engineer / Analyst
Confidential, OH
Responsibilities:
- Responsible for developing infrastructure for Google Cloud Platform and participated in cloud architecture meetings.
- Developed Python code to send bucket/table notifications via pubsub.
- Loaded SupplyChain Data every day on incremental basis to BIGQUERY using Google DataProc, GCS bucket, HIVE, Python, Gsutil.
- Performed exploratory data analysis on large set of data Confidential rest in Hadoop to build a curated data layer to perform data science activities.
- Contributed to all stages of data science or decision modeling projects, including problem formulation, solution development and deployment.
- Worked with business teams to translate business-relevant scientific, engineering and commercial problems in to questions that may be addressed using data science.
- Well versed in using one more of the following software packages: scikit-learn,numpy,pandas,Jupyter,matplotlib,scipy and keras.
- Experienced in solving problems using one or more of the following techniques: Regression,Decision trees, random forest, Boosting, PCA, KMeans.
Environment: GCP, Cloudera Hadoop, Big Query, PubSub, Hue, Python, YAML, Spark
Data Engineer
Confidential, TX
Responsibilities:
- Responsible for developing schemas for drilling data in NoSQL Database.
- Developed scripts to ingest data from external systems to Hadoop.
- Responsible for developing data pipelines using StreamSets.
- Ingested data from WellView server to hdfs storage.
- Batch loading of various drilling data types into a big data store.
- Used GIT repository for tracking changes and coordinating work in the team.
- Parsed, enriched, ingested, and tested the quality of drilling data using Python and Apache Spark.
- Develop data process pipelines and machine learning algorithms(e.g., regression, random forest) to predict dysfucntions.
- Used various Python libraries(matplotlib, plotly, and dash) to visualize data.
- Supported ad hoc query and extract requests from other teams.
- Vizualized data on TIBCO Spotfire.
Java AEM Developer
Confidential
Responsibilities:
- Develop and Design website for Insurance domain using Adobe AEM.
- Good experience on JAVA, JSP, CRX, JCR, Felix, OSGi and other technologies involved in deploying solutions based on the Adobe AEM framework.
- Used CRXDE and Brackets for components & templates development, and eclipse Mars for java based implementations.
- Coordinated and implemented with third party vendors for web chat component using AJAX and Restful services.
- Performed unit testing on various Jira tickets and components.
- Worked on version migration issues from CQ5.5 to AEM6.1.
- Execute process of object model designing, implementation and unit testing.
- Environment: Azure, Hadoop, Hive, Implala, HDFS, Sypder, Jupiter, Apache Spark, Cassandra, Python, StreamSets, Cent OS, Tibco Spotfire, Microsoft VSTS, GIT, AdobeCQ5.x/AEM6.x, JSP, JCR, CRXDE, DAM, OSGI, HTML, CSS, JavaScript, Eclipse, AngularJS, JVM1.8, Maven, Apache Tomcat, JIRA.
Software Developer
Confidential, TX
Responsibilities:
- Implement machine learning algorithms in R, building natural language processing systems.
- Collect, track, and integrate multiple sources of bigdata.
- Maintain SQL scripts to create and populate tables in data warehouse for daily reporting.
- Experience in using statistical modeling and/or machine learning techniques to build models.
- Construct different supervised machine learning models
- Logistic Regression,Support Vector Machine, K Nearest Neighbors, etc. in R and Java.
- Work with business teams to create Hive queries for ad hoc analysis.
- Publish blog posts to promote the company's analytics platform.
- Evaluate the performance of various algorithms/models/strategies based on the real-worlddatasets.
- Use analytical tools and regression analysis to create predictive models.
- Use shiny dashboard, dygraphs, and plotly to develop professional-quality interfaces for data interaction.
- Environments and Platforms - R, Java, Oracle 10g, Tableau, SAS, Hive