Data Engineer Resume
SUMMARY
- Around 5+ years of IT Industry experience in the analysis designing testing maintaining database ETL, Bigdata, Cloud, and Data warehouse applications.
- Experience in designing, developing, execute and maintain data extraction, transformation, and loading for multiple corporate Operational Data Store, Data warehousing, and Data mart systems.
- Strong experience in Business and Data Analysis, Data Profiling, Data Migration, Data Integration, Data governance and Metadata Management, Master Data Management and Configuration Management.
- Ability to collaborate with peers in both, business, and technical areas, to deliver optimal business process solutions, in line with corporate priorities.
- Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying and analyzing risks using appropriate templates and analysis tools.
- Strong in ETL, Data warehousing, Operations Data Store concepts, data marts and OLAP technologies.
- Fashioned on different libraries related to Data science and Machine learning like Pandas, NumPy, SciPy, Matplotlib, Seaborn, Bokeh, nltk, Scikit - learn, OpenCV, TensorFlow, Theano and Keras.
- Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive, Pig, MLlib, ELT.
- Experience with the use of AWS services includes RDS, Networking, Route 53, IAM, S3, EC2, EBS and VPC and also administering AWS resources using Console and CLI.
- Hands on experience on building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using NoSQL and SQL from AWS & Big Data technologies (Dynamo, Kinesis, S3, HIVE/Spark).
- Experience in Data Modeling with expertise in creating Star & Snow-Flake Schemas, FACT and Dimensions Tables, Physical and Logical Data Modeling using Erwin and Embarcadero.
- Worked in supporting integrated testing of interfaces and validation of data mapping, data migration, as well as data conversion activities conducted during pre go-live and post go-live activities
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets. (HiveMQ, Kafka, Cassandra, S3, Redshift)
- Vigorous Perspicacity and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Super-eminent understanding of AWS (Amazon Web Services), S3, Amazon RDS, Apache Spark RDD, process and concepts. Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Extensive experience in designing, developing and publishing visually rich and intuitively interactive Tableau workbooks and dashboards for executive decision making.
- Involved in all phases of software development life cycle in Agile, Scrum and Waterfall management process.
- A self-motivated exuberant learner and adequate with challenging projects and work in ambiguity to solve complex problems independently or in the collaborative team.
TECHNICAL SKILLS
Databases: Snowflake, AWS RDS, Teradata, Oracle 9i/10g, MySQL 5.5/5.6, Microsoft SQL 2008/12/14/16 , Postgre SQL.
NoSQL Databases: MongoDB 3.x, Hadoop HBase 0.98 and Apache Cassandra.
Programming Languages: Python 2.x/3.x, R 3.x, SQL, Scala, pig C, C++, MATLAB, Java, JavaScript.
Cloud Technologies: AWS, Docker
Querying Languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL
Deployment Tools: Anaconda Enterprise v5, R-Studio, Azure Machine Learning Studio, Oozie 4.2, AWS Lambda.
Scalable Data Tools: Hadoop, Hive 1.x/2.x, Apache Spark 2.x, Pig 0.15, Map Reduce, Sqoop.
Operating Systems: Red Hat Linux, Unix, Ubuntu, Debian, Centos, Windows, macOS .
Reporting & Visualization: Tableau 9.x/10.x, Matplotlib, Seaborn, Bokeh, ggplot, iplots, Shiny.
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Maintained and developed complex SQL queries, views, functions and reports that qualify customer requirements on Snowflake.
- Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for an expert understanding of the current end-to-end BI platform architecture to support the deployed solution
- Advanced and developed test plans to ensure successful delivery of a project. Employed performance analytics predicated on high-quality data to develop reports and dashboards with actionable insights
- Worked with the ETL team to document the transformation rules for Data migration from OLTP to Warehouse environment for reporting purposes.
- Précised Development and implementation of several types of sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using Tableau.
- Acquainted with parameterized sales performance reports, done the reports every month and distributed them to respective departments/clients using Tableau.
- Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL
- Worked on ingestion of applications/files from one Commercial VPC to OneLake.
- Worked on building EC2 instances, Creating IAM user’s groups and defining policies.
- Worked on creating S3 buckets and giving bucket policies as per client requirement.
- Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Analyzed data using SQL, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
- Creating the High Level and Low-Level design document as per the business requirement and working with offshore team to guide them on design and development.
- Continuously monitoring for the processes which are taking longer than expected time to execute and tune the process.
- Optimized current pivot tables' reports using Tableau and proposed an expanded set of views in the form of interactive dashboards using line graphs, bar charts, heat maps, tree maps, trend analysis, Pareto charts and bubble charts to enhance data analysis.
- Monitor system life cycle deliverables and activities to ensure that procedures and methodologies are followed, and that appropriate complete documentation is captured.
Technical Environment: SQL, Snowflake, Python 3.x (Scikit -Learn/ Keras/ SciPy/ NumPy/ Pandas/ Matplotlib/ NLTK/ Seaborn), Tableau (9.x/10.x), Hive, Databricks, Airflow, PostgreSQL, AWS, JIRA, GitHub.
Confidential
Data Engineer
Responsibilities:
- Worked in the Data transformation team for an internal application called NEXUS where I was responsible for all the Data Development, Data Analysis, Data Reporting, Data Quality and Data Maintenance efforts of the application which my team was highly focusing on.
- Worked in Production Environment which involves building CI/CD pipeline using Jenkins with various stages starting from code checkout from GitHub to Deploying code in specific environment.
- Created various Boto scripts to maintain application in the cloud environment and automated it.
- Analyzed the Incident, Change and Job data from snowflake and created a dependency tree-based model on the occurrence of incident for every application service present internally.
- Helped business people to minimize the manual work they were doing and created python scripts like LDA sourcing, OneLake, SDP. S3, Databricks, Databench, Snowflake to get the cloud metrics and make their efforts easier.
- Merged with D4 Rise Analytics team and helped them to remove all the manual work they were doing on Incident data by generating metrics pulling data from various sources like ServiceNow, Snowflake, AROW job data and few other API calls and created an Incident Dashboard with lot of intelligence built in it.
- Used Spark SQL to migrate the data from hive to python using pyspark library.
- I have used Hadoop technologies like spark and hive Including using the pyspark library to create spark dataframes and converting them to normal pandas dataframes for analysis.
- Created complex SQL Stored Procedures, Triggers, HIVE queries and User Defined Functions to manage incoming data as well as to support existing applications.
- Queried and analyzed large amounts of data on Hadoop HDFS using Hive and Spark
- Lead the interns project by getting them started on AWS, best practices, suggesting other simplified methods and monitoring their work weekly.
- Created the AWS cloud environment for the application, Designed the Network model for the application and also worked very intensively in getting the application setup with various components in the AWS Cloud from the user interface to back-end.
- Built a prototype for a project containing various kinds of dashboard metrics for all the applications to have a central metric tracker.
- Built a process to get the usage of OneLake usage for all the LOB’s for higher level VPs that includes Kibana data source, EFK, Kinesis streams and created AWS Lambda functions which automates in daily manner.
- Built interactive dashboards and stories using Tableau Desktop for accuracy in report generation applying advanced Tableau functionality: parameters, actions and tooltip changes.
- Worked in testing an internal log builder application which was to be implemented all over the company.
Technical Environment: Python 3.x, Tableau (9.x/10.x), Hadoop, HDFS, PySpark, Teradata, PostgreSQL, AWS, Jenkins, SQL, Snowflake, JIRA, GitHub, Agile/ SCRUM.
Confidential
Data Engineer
Responsibilities:
- Developed pipeline using Hive (HQL) to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
- Analyzed and gathered business requirements from clients, conceptualized solutions with technical architects, and verified approach with appropriate stakeholders, developed E2E scenarios for building the application.
- Derived data from relational databases to perform complex data manipulations and conducted extensive data checks to ensure data quality. Performed Data wrangling to clean, transform and reshape the data utilizing NumPy and Pandas library.
- We have worked with datasets of varying degrees of size and complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation, Visualization and Performed Gap analysis.
- Optimized lot of SQL statements and PL/SQL blocks by analyzing the execute plans of SQL statement and created and modified triggers, SQL queries, stored procedures for performance improvement.
- Implemented Predictive analytics and machine learning algorithms in Databricks to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
- Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
- Used Sqoop to move data from oracle database into hive by creating a delimiter separated files and using these files in an external location to be used as an external table in hive and further moving the data into refined tables as parquet format using hive queries
- Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems.
- Developed spark programs using Scala API's to compare the performance of spark with HIVE and SQL.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Evaluated the performance of Databricks environment by converting complex Redshift scripts to spark SQL as part of new technology adaption project.
- Lead engagement planning; developed and managed Tableau implementation plans for the stakeholders, ensuring timely completion and successful delivery according to stakeholder expectations.
- Managed workload and utilization of the team. Coordinated resources and processes to achieve Tableau implementation plans.
Technical Environment: R, Python, ETL, Agile, Data Quality, R Studio, Tableau, Data Governance, Supervised & Unsupervised Learning, Java, NumPy, SciPy, Hadoop, Sqoop, HDFS, Spark SQL, Pandas, PostgreSQL, AWS (EC2, RDS, S3), Matplotlib, Scikit-learn, Shiny
Confidential
Data Engineer
Responsibilities:
- Created and presented models for potential holdings to fund managers. Achieved 20% better returns v/s historical performance and Predicted stock price 25% better than traditional figures.
- Facilitated User Acceptance Testing and identified KPIs in the Reference data to predict failure of client's payments.
- Fashioned with machine learning algorithms like linear regression, KNN and decision tree for trading problems, estimated machine learning algorithm's performance for time series data (stock price data).
- Calculated and presented data visualization report for daily returns, cumulative returns, simple moving averages, Sharpe ratio, portfolio value for stock performance optimization.
- Analyzed large data sets using pandas and used regression models using SciPy to predict future data and visualized them and used SQL to manipulate data
- Proficient in importing/exporting large amounts of data from files to Teradata and Created Teradata specific physical data models which included primary indexes, secondary indexes and joined indexes.
- Implemented Dimensional Data Modeling to deliver Multi-Dimensional STAR schemas and Developed Snowflake Schemas by normalizing the dimension tables as appropriate.
- Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Used Redshift, S3 within AWS along with Informatica Cloud Services to load data into s3 bucket.
- Worked with HDFS file formats like Avro, Sequence File and various compression formats like Snappy.
- Collaborated with Data engineers and operation team to implement ETL process, Snowflake models, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Written SQL Scripts and PL/SQL Scripts to extract data from Database and for Testing Purposes.
- Created data pipeline package to move data from amazon S3 bucket to MYSQL database and executed MySQL stored procedure using events to load data into tables.
- Designed and developed Tableau graphical and visualization solutions based on business requirement and expert with the backend data retrieval team, data mart team to guide the proper structuring of data for Tableau reporting.
- Worked on metadata in Tableau desktop to alter data types, assign roles, rename fields, joins, refresh data and change extract data options to prepare data sources to be utilized by users.
Technical Environment: Python 3.6.4, MLLib, SQL Server, Hive, Hadoop Cluster, ETL, Spyder 3.6, Agile, Tableau, Java, NumPy, AWS (EC2, S3), Informatica Power Center, Teradata.