We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Flatiron, NY

PROFESSIONAL SUMMARY:

  • Professional 7+years of experience as a Data Engineer and coding with analytical programming using Python. experience in AWS Cloud Engineer (Administrator) and working on AWS Services IAM, EC2, VPC, AMI, SNS, RDS,SQS,EMR,LAMBDA,GLUE,ATHENA, Dynamo DB, Cloud Watch, Auto Scaling, S3, and Route 53 .Worked in various Linux server environments from DEV all the way to PROD and along with cloud powered strategies embracing Amazon Web Services (AWS).
  • Good Knowledge of web services using GRPC and GRAPHQL protocols.
  • Used GRPC and GRAPHQL as a data Gateway.
  • Strong experience in CI (Continuous Integration)/ CD (Continuous Delivery) software development pipeline stages like Commit, Build, Automated Tests, and Deploy using Bogie Pipelines in Jenkins.
  • Experience in using analytic data warehouse like Snowflake.
  • Experience in using Data bricks for handling all analytical process from ETL to all data modeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs.
  • Experience in Apache Airflow to author workflows as directed acyclic graphs (DAGs ), to visualize batch and real - time data pipelines running in production, monitor progress, and troubleshoot issues when needed.
  • Experience in Quantum frameworks to easily ingest, process and act on batch and streaming data using Apache Spark.
  • Worked on Dockers containers by combining them with the workflow to make them lightweight.
  • Experience in tuning EMR according to requirements on importing and exporting data using stream processing platforms like Kafka.
  • Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Watch.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data.
  • Proficient in Data Warehousing, Data Mining concepts and ETL transformations from Source to target systems.
  • Experience to build servers using AWS which includes importing necessary volumes, launching the EC2 instance .
  • Creating security groups, auto-scaling, load balancers, Route 53 and SNS as per the architecture.
  • Experience on setting up the life cycle policies to back the data from AWS S3 to AWS Glacier , Worked with various AWS, EC2 and S3 CLI tools.
  • Expertise in DevOps, Release Engineering, Configuration Management, Cloud Infrastructure, Automation. It includes Amazon Web services (AWS), Apache Maven, Jenkins, Githu b, and Linux etc.
  • Experienced in creating User/Group Accounts, Federated users and access management to User/Group Accounts using AWS IAM service .
  • Set-up databases in AWS using RDS , storage using S3 bucket and configuring instance backups to S3 bucket.
  • Expertise in Querying RDBMS such as POSTGRES, MYSQL and SQL Server by using SQL for data integrity.
  • Experienced in Working on Big Data Integration and Analytics based on Hadoop, Kafka .
  • Excellent understanding and knowledge of Hadoop Distributed file system data modelling, architecture and design principles.
  • Experience in Hadoop cluster performance tuning by gathering and analyzing the existing infrastructure.
  • Constructed a Kafka broker with proper configurations for the needs of the organization using big data.
  • Good experience in Shell Scripting, SQL Server, UNIX and Linux and knowledge on version control software Github
  • Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning, Possess excellent communication, project management, documentation, interpersonal skills.

TECHNICAL SKILLS:

Big Data Eco System: Hadoop, Kafka, Sqoop, Zookeeper, Spark, Avro, Parquet and Snappy.

Languages: Python, Scala, SQL, Linux shell scripting

Databases: Oracle, DB2, SQL Server, MySQL, PL/SQL., NoSQL, RDS, HBase, PostgreSQL

AWS: EC2, S3, EMR, Dynamo DB, Athena, AWS Data-Pipeline, AWS Lambda, cloud watch, SNS,SQS

Micro services Tools: GRPC,GRAPHQL

CICD Tools: Jenkins, Artifactory.

Build Tools: Maven

IDE/ Programming Tools: Net Beans, PyCharm.

Virtualization tools: Dockers

Source control tools: GIT

Operating Systems: UNIX, Linux, Windows.

J2EE Technologies: Servlets, JDBC.

Web Technologies: HTML, CSS, XML, Bootstrap

Libraries and Tools: PySpark, Psycopg2, PySpells, PyArrow, Pandas, Mysql, Boto3, Jira, Scrum.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, Flatiron, NY

Responsibilities:

  • Craft highly scalable and resilient cloud architectures that address customer business problems and accelerate the adoption of AWS services for clients.
  • Build application and database servers using AWS EC2 and create AMIs as well as use RDS for PostgreSQL .
  • Carried Deployments and builds on various environments using continuous integration tool Jenkins. Designed the project workflows/pipelines using Jenkins as CI tool.
  • Used Terraform to allow infrastructure to be expressed as code in building EC2, LAMBDA, RDS.
  • Built analytical warehouses in Snowflakes and queried data in staged files by referencing metadata columns in a staged file.
  • Continuous data loads using Snow-Pipe and file sizing and loaded structure and semi-structured data using web interfaces into Snowflakes.
  • Involved in designing of API's for the networking and cloud services and Leveraged spark (PySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
  • Designed Data Quality Framework to perform schema validation and data profiling on Spark (PySpark).
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Used Pandas API to put the data as time series and tabular format for easy timestamp data manipulation and retrieval.
  • Integration of web portal and users associated with S3 bucket. Used Amazon S3 to backup database instances periodically to save snapshots of data.
  • Created a Kafka broker in structured streaming to get structured data by schema using case classes. Configured Kafka broker for the Kafka cluster of the project and streamed the data to Spark for structured streaming to get structured data by schema.
  • Implemented Spark in EMR for processing Enterprise Data across our Data Lake in AWS System.
  • Fine-tuned Ec2 for long-running Spark Applications to utilize better parallelism and executor memory for more caching.
  • Experience working on Dockers Hub, creating Dockers images and handling multiple images primarily for middleware installations and domain configuration.
  • Developed GIT hooks for the local repository, code commit and remote repository, code push functionality and worked on the GIT-Hub.
  • Developed Airflow Workflow to schedule batch and real-time data from source to target.
  • Backing up AWS Postgres to S3 on daily job run on EMR using Data Frames.
  • Worked on ETL Processing which consists of data transformation, data sourcing and also mapping, Conversion and loading.
  • Used Pandas API to put the data as time series and tabular form for east timestamp data manipulation and retrieval to handle time series data and do data manipulation.
  • Responsible for Technical architecture and creation of technical specs & designing of ETL processes like mapping, source, target and staging databases.
  • Knowledge on cloud based DAG and Apache Air Flow.
  • Exploring DAG's, their dependencies and logs using Air Flow pipelines for automation

Environment: Python 2.7&3.7, AWS Lambda,SNS,SQS,EMR,Ec2,Cloud watch, RDS, Spark, Linux, Shell Scripting, Github, Jira

AWS/Python Developer

Confidential, Malvern, PA

Responsibilities:

  • Wrote scripts in Python for automation of testing Framework jobs.
  • Used Multi-Threading factory model to distribute learning process back-testing and the into various worker processes.
  • Data loaded into s3 using Multi-Threading process into single file using Python Code also developed Wrapper in Python for instantiating multi-threaded application.
  • Implemented configuration changes for data models.
  • Maintained and updated existing automated solutions.
  • Handled potential points of failure through error handling and communication of failure.
  • Used Amazon EC2 command line interface along with Bash/ Python to automate repetitive work
  • Created Python script to monitor server load performance in production environment and horizontally scale the servers by deploying new instances.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Lock mechanisms were implemented and the functionality of multithreading has been used.
  • Python/Django based web application, PostgreSQL DB, and integrations with 3rd party email, messaging, storage services.
  • Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash.
  • Developed and delivered training presentations on logical data modeling, application process modeling and metadata management for business units
  • Used object-relational mapper ( ORM ) to automate the transfer of data stored in relational databases tables into objects.
  • Successfully migrated the database from SQLite to MySQL to PostgreSQL with complete data Integrity.
  • Written Shell scripting to spin up clusters like EMR, EC2, and amazon RDS.
  • Anticipated potential parts of failure (database, communication points, file system errors).
  • Troubleshoot the process execution and worked with other team members to correct them.
  • Actively worked as a part of team with managers and other staff to meet the goals of the project in the stipulated time.
  • Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
  • Used Pandas library for statistics Analysis.
  • Used Numpy for Numerical analysis for Insurance premium.
  • Used Psycopg2 to make connection to RDS, which written in python.
  • Large Data was optimized using chunk-size method to load to S3
  • Managed large datasets using Pandas data frames and MySQL.
  • Extensively used python modules such as requests, urllib and urllib2 for web crawling.
  • Invoked Lambda through Cloud Watch for triggering it.
  • Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage
  • Developed the required XML Schema documents and implemented the framework for parsing XML documents.

Environment: Python 2.7, Django, HTML5/CSS, Postgres, AWS EMR, EC2, SNS, SQS, MySQL, JavaScript, Eclipse, Linux, Shell Scripting, JQuery, Github, Jira, PySpark, Bootstrap, Jenkins

AWS/ Python Developer

Confidential, St. Louis, MO

Responsibilities:

  • Booting up nodes using prebuilt images on Amazon EC2. Uploading, Copying, Downloading, and Deleting files using Amazon S3. .
  • Worked on loading CSV/TXT/DAT files using Scala language in Spark Framework to process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader.
  • Worked on various applications using Python integrated IDEs Eclipse, PyCharm, and Net Beans.
  • Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python
  • Built database Model, Views and API's using Python for interactive web based solutions.
  • Used Python scripts to update the content in database and manipulate files.
  • Wrote and executed several complex SQL queries in AWS glue for ETL operations in Spark data frame using SparkSQL.
  • Automated most of the daily task using python scripting.
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • As point on the L2 support team worked with application, database, reporting, SQM and project managements team to resolve
  • Responsible for user validations on client side as well as server side.
  • Automated the existing scripts for performance calculations using Numpy and SQL alchemy.
  • Interacted with QA to develop test plans from high-level design documentation.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Implemented Restful Web-Services for sending and receiving the data between multiple systems.
  • We used to send text data through us sockets with most API’s and JSON.
  • Responsible for debugging and troubleshooting the web application.
  • Developed, tested and debugged software tools utilized by clients and internal customers.
  • Coded test programs and evaluated existing engineering processes.

Environment: Linux, Python 2.7, 2.5, Django 1.7, 1.5, HTML5, XML, JavaScript,, MS SQL Server, NoSQL, Amazon s3, Jenkins, Git, Github, JIRA, AWS Services.

Software developer

Confidential

Responsibilities:

  • Implemented user interface guidelines and standards throughout the development and maintenance of the website using the HTML, CSS, JavaScript and JQuery.
  • Contributed to development of security policies and processes.
  • Developed views and templates with Python view controller and templating language to create a user-friendly website interface.
  • Refactor Python modules to deliver certain format of data.
  • Development of Python APIs to dump the array structures in the Processor at the failure point for debugging.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
  • .
  • Creating restful web services for Catalog and Pricing with MySQL, NoSQL and Mongo DB.
  • Interact with backend using Java and Hibernate Framework.
  • Involved in Coding of Enterprise Java Beans, which implements business rules, and business logic.
  • Involved in developing the Java classes and JavaBeans.
  • Representation of the system in hierarchy form by defining the components, subcomponents using Python and developed set of library functions over the system based on the user needs.
  • Used Selenium Library to write fully functioning test automation process that allowed the simulation of submitting different requests from multiple browsers to web application.
  • Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.
  • Must have strong experience in Python to support Q-Direct and major part of Q-Direct code is on Python.
  • Used Python creating graphics, XML processing, data exchange and business logic implementation.
  • Utilize in-depth knowledge of Technical experience in LAMP and other leading-edge products and technology in conjunction with industry and business skills to deliver solutions to customer.
  • Developed multiple spark batch jobs in Scala using Spark SQL and performed transformations using many APIs and update master data in Cassandra database as per the business requirement.
  • Written Spark-Scala scripts, by creating multiple udf's, spark context, Cassandra sql context, multiple API's, methods which support data frames, RDD's, data frame Joins, Cassandra table joins and finally write/save the data frames/RDD's to Cassandra database.
  • As part of the POC migrated the data from source systems to another environment using Spark, SparkSQL.
  • Developed and implemented core API services using Python with spark.
  • Experienced in Export Transform Load.
  • Representation of the system in hierarchy form by defining the components, subcomponents using Python and developed set of library functions over the system based on the user needs.

Environment: Linux, Python 2.7, 2.5, HTML5, XML, JavaScript, JQuery, MS SQL Server, NoSQL, Jenkins, Mongo DB, Beautiful soup, Eclipse, Git, Github, JIRA.

Software developer

Confidential

Responsibilities:

  • Translated the customer requirements into design specifications and ensured that the requirements translate into software solution.
  • Application was based on service oriented architecture and used Python 2.5, Ajax, HTML, CSS for the frontend.
  • Involved in creating Servlets and Java Server Pages (JSP), which route submittals to the appropriate Enterprise Java Bean (EJB) components and render retrieved information using Session Facade.
  • Designed the front end of the application using Python, HTML, CSS, AJAX, JSON and JQuery. Worked on backend of the application, mainly using Active Records.
  • Involved in the design, development and testing phases of application using AGILE methodology.
  • Developed and designed an API (Restful Web Service).
  • Used the Python language to develop web-based data retrieval systems.
  • Designed and maintained databases using Python and developed Python based API (Restful Web Service) using Flask, SQLAlchemy and PostgreSQL.
  • Developed web sites using Python, XHTML, CSS, and JavaScript.
  • Developed and designed e-mail marketing campaigns using HTML and CSS.
  • Tested and implemented applications built using Python.
  • Developed and tested many features for dashboard using Python, ROBOT framework, Bootstrap, CSS, and JavaScript.
  • Created complex dynamic HTML UI using jQuery.
  • Automated Regression analysis for determining fund returns based on index returns (Python/Excel).
  • Worked on development of SQL and stored procedures, trigger and function on MYSQL,NoSQL.
  • Developed shopping cart for Library and integrated web services to access the payment.
  • Used PHP language on lamp server to develop page.
  • Developed server based web traffic statistical analysis tool using Flask, Pandas.
  • Implemented and tested python based web applications interacting with MySQL.
  • Developed dynamic interaction page on .net MSvisualbasic-2014, using SQL developer tools.
  • Simple web app for reviewing sitcoms that gives users the ability to view, add, review, up/down vote, search, etc.
  • Performed joins, group by and other operations in Map Reduce using Python.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Wrote Map Reduce code to make un-structured data into semi- structured data and loaded into Hive tables.
  • Supported Map Reduce Programs those are running on the cluster.
  • Executed queries using Hive and developed Map Reduce jobs to analyze data.
  • Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily agile (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables within time.

Environment: Linux, Python 2.7, Mod Python, Perl, PHP, MySQL, NoSQL, JavaScript, Ajax, Shell Script, HTML, CSS.

……………………………………………………………………………………………………..

We'd love your feedback!