Data Engineer Resume
Chicago, IL
SUMMARY:
- Around 5+ years of work experience in IT as a Big Data Engineer, AWS Data Engineer and Programmer Analyst.
- Proven expertise in deploying major software solutions for various high - end clients meeting the business requirements such as Big Data Processing, Ingestion, Analytics and Cloud Migration from On-prem to AWS Cloud.
- Expertise in deploying cloud-based services with Amazon Web Services (Databases, Migration, Compute, IAM, Storage, Analytics, Network & Content Delivery, Lambda and Application Integration).
- Worked on End to End Software Development Life Cycle process in Agile Environment using SCRUM methodologies.
- Hands on expertise with AWS Databases such as RDS (Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
- Experience in Ruby on Rails on developing user friendly and efficient web-based applications specific to client's unique needs.
- Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK’s
- Good Hands on expertise with AWS storage services such as S3, EFS, Storage Gateways and partial familiarity with Snowball.
- Great Understanding of AWS compute services such as EC2, Elastic MapReduce (EMR), EBS and accessing Instance metadata.
- Hands-on experience setting up Kubernetes (k8s) Clusters for running microservices. Took several microservices into production with Kubernetes backed Infrastructure.
- In dept understanding of Monitoring/Auditing tools in AWS such as CloudWatch and Cloud Trail.
- Developed and deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
- Expertise understanding of AWS DNS Services through Route53. Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types.
- Great Understanding of AWS Encryption types both in Transit (SSL/TLS) and Rest (SSE-S3, KMS, HSM, AES-256, and SSE-Cinet & Server Side).
- Expertise understanding of AWS Network and Content Delivery services through Virtual Private Cloud (VPC).
- Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK’s
- Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
- Writing Cloud Formation Templates in JSON for Network and Content Delivery of the AWSCloud Environment.
- Great understanding of Hadoop framework and Big Data Ecosystems. Proven expertise on Big Data Ingestion, Processing and Workflow designs.
- Worked on Sqoop for data Ingestion, Hive & Spark for Data Processing & Oozie for designing complex work flows in Hadoop frame work.
- Worked on Oozie, Airflow & SWF for complex workflows. Familiarity with HUE.
- Worked on Scala code base related to Apache Spark performing the Actions, Transformations on RDDs, DataFrames & Datasets using SparkSQL and Spark Streaming Contexts.
- Access and ingest Hive tables to Apache Kudu for Data warehousing after performing the record joins using Kudu context in Apache Spark.
- Good understanding of Apache Spark High level architecture and performance tuning patterns.
- Great knowledge about Hive (architecture, Thrift servers), HQLs, Beeline and other 3rdparty JBDC connectivity services to Hive.
- Migrating Hive & MapReduce jobs to EMR and Qubole with automating the workflows using Airflow.
- Easy of operability with the UNIX file system through command line interface.
- Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
- Handling Server to Server Key encryptions and accessibility to various owners and groups in Unix environments.
- Addressing complex POCs according to business requirements from the technical end.
- Writing test cases for achieving the Unit test accomplishment.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Storm, and Flume.
Spark Streaming Technologies: Spark, Kafka, Storm
Scripting Languages: Cassandra, Python, Scala, Ruby on Rails and Bash.
Programming Languages: Java, SQL, Java Scripting, HTML5, CSS3
Databases: Data warehouse, RDBMS, NoSQL (Certified MongoDB), Oracle.
Tools: Eclipse, JDeveloper, MS Visual Studio, Microsoft Azure HDinsight, Microsoft Hadoop cluster, JIRA.
Methodologies: Agile, UML, Design Patterns.
Operating Systems: Unix/Linux
PROFESSIONAL EXPERIENCE:
Confidential - Chicago,IL
Data Engineer
Responsibilities:
- Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive & MapReduce Jobs for processing data for building recommendation Engines, Transactional fraud analytics and Behavioral insights.
- Team player for the DataLake production support. The DataLake typically supports over 750million searches/day, 9Billion pricing inventory updates/day, 14 Trillion automated transactions/year generating around 1.2TB of Data daily.
- Populating the DataLake is done by leveraging Amazon S3 services interactions made possible through Amazon Cognito/Boto/s3cmd.
- Parsing the data from S3 through the Python API calls through the Amazon API Gateway generating Batch Source for processing.
- Outstanding ability to use the best practices in Ruby on Rails development.
- Scheduling Batch jobs through Amazon Batch performing Data processing jobs by leveraging Apache Spark APIs through Scala.
- Performed Spark jobs with the Spark core, SparkSQL libraries for processing the data.
- Great familiarity with Spark RDDs, Spark Actions & Transformations, Spark DataFrames, SparkSQL Spark File formats.
- Expertise knowledge about Hadoop & Spark High Level Architectures.
- Hands-on experience in implementing and deploying (Elastic Map Reduce) EMR cluster leveraging Amazon Web Services with EC2 instances.
- Involved in developing and implementation of the web application using Ruby on Rails.
- Good familiarity with AWS services like Dynamo DB, Redshift, Simple Storage Service (S3), and Amazon Elastic Search Services.
- Great knowledge about EMRFS, S3 bucketing, m3. xlarge, c3.4xlarge, IAM Roles & Cloud watch logs.
- Deployment mode of the cluster was achieved through YARN scheduler and the size is Auto scalable.
- Used Amazon Airflow for complex workflow automation. The process automation is done by wrapper scripts through shell scripting.
- Used Active Records for Database Migration and also involved in using Active Resources, Fixtures, Action View and Action Controller in the Rails Framework.
- Great familiarity with Linux environment and user groups.
- Worked with JSON, CSV, Sequential and Text file formats.
- Importing data from Dynamo DB to Redshift in Batches using Amazon Batch using TWS scheduler.
- Used Flask and Gangila for monitoring Spark jobs.
- Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration and Infra Teams.
- Situational awareness of the staging environment, active JIRAs and production support issues.
- Build tool for leveraging the dependencies was simple build tool (SBT).
- Worked with data owners, Business Units, Data Integration team and customers in fast paced Agile/Scrum environment.
- Familiarity with reporting and BI tools used in the reporting & visualization processes.
- Active involvement in Business meetings and team meetings.
Environment: Amazon Web Services, Elastic Map Reduce (EMR 4.1.0) cluster, EC2 Instances, m3. xlarge(master), c3.4xlarge(slaves), Airflow, Amazon S3, Amazon Redshift, Dynamo DB, Cloud Watch Logs, IAM Roles, Ganglia, EMRFS, s3cmd(Batches), Ruby EMR Utility(monitoring), Boto3, Amazon Cognito, AWS API Gateway, AWS Lambda, Kinesis(streams), Hive, Scala, Python, HBase, Sqoop, SBT, Apache Spark, Spark SQL, Shell Scripting, Tableau.
Confidential , Champaign, IL
Big Data Engineer
Responsibilities:
- My responsibility in this project is to create an e-commerce application as per as business requirements.
- The application got deployed using JSON and AngularJS as part of MongoDB which is also called a NOSQL.
- Conducting some transformations using Cassandra Query Language (CQL).
- The data is ingested into this application by using Hadoop technologies like PIG and HIVE.
- The feedbacks are retrieved using Sqoop.
- Became a major contributor and potential committer of an important open source Apache project.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Deployed application to GCP using Spinnekar(rpm based)
- Launched multi-node kubernetes cluster in Google Kubernetes Engine (GKE) and migrated the dockerized application from AWS to GCP.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozieto automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Troubleshooting, Manage and review data backups, Manage & review Hadoop logfiles.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Defined Oozie Job flows.
- Loaded log data directly into HDFS using Flume.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Followed standard Back up policies to make sure the high availability of cluster.
Environment: s: Cassandra, HDFS, MongoDB, Zookeeper, Oozie, Pig, Google Cloud Platform(GCP), Kubernetes, GitHub, Jenkins, Docker, JIRA, Unix/Linux CentOS 7, Nexus V3, Bash Shell Script, Python, Node.js, Apache Tomcat, MongoDB, SQL.
Confidential
Software Engineer
Responsibilities:
- Primary Responsibility was to perform hands-on code development for large-scale, complex business application.
- Develop and Implement prototype in PLSQL, Java and JavaScript to fulfill change requests.
- Participated in Impact Assessment, technical design and Development activities with architects and also documented technical specifications.
- Developed the user interfaces using Web 2.0, AJAX, JSP, Struts, HTML, CSS, Java Script, and DHTML.
- Generalized top link mapping template for the application so that top link related code could be separated from the actual DAO implementation.
- Worked on MVC framework preferably Web Work and STRUTS 2.0 with spring dependency injection for application customization and upgrade.
- Involved in Unit testing and support activities for next level testing areas, Debugging and resolve performance issues in production and pre-production environment.
- Conducted code reviews and knowledge transfer sessions.
- Reporting progress, status, risk and issue in a timely manner to immediate supervisor.
- Analyze and improve code for error encountered and improving efficiency, reliability and performance tuning.
- Experience in working with Java, J2EE, React, and NoSQL, Database and web technologies.
- Experience in working with JavaScript, Node JS, creating controllers, Custom directives and built in directives.
- Created custom views using bootstrap, HTML5, CSS3.
- Proficient in writing unit test cases and documentation of test results for client server applications.
- Worked on Backend system with Oracle 11g, and development using SQL and PLSQL.
Environment: Java, JavaScript, J2EE,Web 2.0, AJAX, JSP, Struts, HTML, HTML5, CSS3, CSS, and HTML Node JS, bootstrap, Oracle 11g, NoSQL, Database, PLSQL.
Confidential
Programmer Analyst
Responsibilities:
- Implemented several User Stories using core java, HTML, CSS.
- Solid understanding of Object Oriented Design and analysis with extensive experience in the full life cycle of the software design process including requirement definition, prototyping, Proof of Concept, Design, Implementation and Testing.
- Experience on backend using collections, Structs, maps, DHTML, JavaScript, IDE&Tool Eclipse, Notepad++.
- Involved in the Software Development Life Cycle phases like Requirement Analysis, Implementation and estimating the time-lines for the project.
- Assisted in resolving data level issues dealing with input & output streams.
- Developed custom directives (elements, Attributes and classes).
- Developed single page applications using angular.js
- Extensively involved in redesigning the entire site with CSS styles for consistent look and feel across all browsers and all pages.
- Used Angular MVC and two-way data binding.
- Created Images, Logos and Icons that are used across the web pages using Adobe Flash and Photoshop.
- Development of the interactive UI's for the front-end users using the front end technologies like HTML, CSS, JavaScript and jQuery.
- Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation
Environment: Core Java, Java Script, UI/UX, Linux, Shell Scripting, Web Browsers, Instrumentation, Oracle SQL Server, SQL queries, Relational Data bases.