We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

NY

SUMMARY

  • Professional 7+years of experience as a Data Engineer and coding wif analytical programming using Python.
  • Experience in AWS Cloud Engineer (Administrator) and working on AWS Services IAM, EC2, VPC, AMI, SNS, RDS,SQS,EMR,LAMBDA,GLUE,ATHENA, Dynamo DB, Cloud Watch, Auto Scaling, S3, and Route 53.
  • Worked in various Linux server environments from DEV all teh way to PROD and along wif cloud powered strategies embracing Amazon Web Services (AWS).
  • Good Knowledge of web services using GRPC and GRAPHQL protocols.
  • Used GRPC and GRAPHQL as a data Gateway.
  • Strong experience in CI (Continuous Integration)/ CD (Continuous Delivery) software development pipeline stages like Commit, Build, Automated Tests, and Deploy using Bogie Pipelines in Jenkins.
  • Experience in using analytic data warehouse like Snowflake.
  • Experience in using Data bricks for handling all analytical process from ETL to all data modeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs.
  • Experience in Apache Airflow to author workflows as directed acyclic graphs (DAGs), to visualize batch and real - time data pipelines running in production, monitor progress, and troubleshoot issues when needed.
  • Experience in Quantum frameworks to easily ingest, process and act on batch and streaming data using Apache Spark.
  • Worked on Dockers containers by combining them wif teh workflow to make them lightweight.
  • Experience in tuning EMR according to requirements on importing and exporting data using stream processing platforms like Kafka.
  • Experience wif developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Watch.
  • Exploring wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data.
  • Proficient in Data Warehousing, Data Mining concepts and ETL transformations from Source to target systems.
  • Experience to build servers usingAWS which includes importing necessary volumes, launching teh EC2 instance.
  • Creating security groups, auto-scaling, load balancers, Route 53 and SNS as per teh architecture.
  • Experience on setting up teh life cycle policies to back teh data fromAWS S3 toAWS Glacier, Worked wif variousAWS, EC2 and S3 CLI tools.
  • Expertise in DevOps, Release Engineering, Configuration Management, Cloud Infrastructure, Automation. It includes Amazon Web services (AWS), Apache Maven, Jenkins, Github, and Linux etc.
  • Experienced in creating User/Group Accounts, Federated users and access management to User/Group Accounts usingAWSIAM service.
  • Set-up databases inAWSusing RDS, storage using S3 bucket and configuring instance backups to S3 bucket.
  • Expertise in Querying RDBMS such as POSTGRES, MYSQL and SQL Server by using SQL for data integrity.
  • Experienced in Working on Big Data Integration and Analytics based on Hadoop, Kafka.
  • Excellent understanding and noledge of Hadoop Distributed file system data modelling, architecture and design principles.
  • Experience in Hadoop cluster performance tuning by gathering and analyzing teh existing infrastructure.
  • Constructed a Kafka broker wif proper configurations for teh needs of teh organization usingbigdata.
  • Good experience in Shell Scripting, SQL Server, UNIX and Linux and noledge on version control software Github
  • Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning, Possess excellent communication, project management, documentation, interpersonal skills.

TECHNICAL SKILLS

Big Data Eco System: Hadoop, Kafka, Sqoop, Zookeeper, Spark, Avro, Parquet and Snappy.

Languages: Python, Scala, SQL, Linux shell scripting

Databases: Oracle, DB2, SQL Server, MySQL, PL/SQL., NoSQL, RDS, HBase, PostgreSQL

AWS: EC2, S3, EMR, Dynamo DB, Athena, AWS Data-Pipeline, AWS Lambda, cloud watch, SNS,SQS

Micro services Tools: GRPC,GRAPHQL

CICD Tools: Jenkins, Artifactory.

Build Tools: Maven

IDE/ Programming Tools: Net Beans, PyCharm.

Virtualization tools: Dockers

Source control tools: GIT

Operating Systems: UNIX, Linux, Windows.

J2EE Technologies: Servlets, JDBC.

Web Technologies: HTML, CSS, XML, Bootstrap

Libraries and Tools: PySpark, Psycopg2, PySpells, PyArrow, Pandas, Mysql, Boto3, Jira, Scrum.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, NY

Responsibilities:

  • Craft highly scalable and resilient cloud architectures dat address customer business problems and accelerate teh adoption ofAWSservices for clients.
  • Build application and database servers using AWS EC2 and create AMIs as well as use RDS for PostgreSQL.
  • Carried Deployments and builds on various environments using continuous integration tool Jenkins. Designed teh project workflows/pipelines using Jenkins as CI tool.
  • Used Terraform to allow infrastructure to be expressed as code in building EC2, LAMBDA, RDS.
  • Built analytical warehouses in Snowflakes and queried data in staged files by referencing metadata columns in a staged file.
  • Continuous data loads using Snow-Pipe and file sizing and loaded structure and semi-structured data using web interfaces into Snowflakes.
  • Involved in designing of API's for teh networking and cloud services and Leveraged spark (PySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
  • Designed Data Quality Framework to perform schema validation and data profiling on Spark (PySpark).
  • Implemented advanced procedures like text analytics and processing using teh in-memory computing capabilities like Apache Spark written in Scala.
  • Used Pandas API to put teh data as time series and tabular format for easy timestamp data manipulation and retrieval.
  • Integration of web portal and users associated wif S3 bucket. Used Amazon S3 to backup database instances periodically to save snapshots of data.
  • Created a Kafka broker in structured streaming to get structureddataby schema using case classes. Configured Kafka broker for teh Kafka cluster of teh project and streamed thedatato Spark for structured streaming to get structureddataby schema.
  • Implemented Spark in EMR for processingEnterprise Dataacross ourDataLake in AWS System.
  • Fine-tuned Ec2 for long-running Spark Applications to utilize better parallelism and executor memory for more caching.
  • Experience working on Dockers Hub, creating Dockers images and handling multiple images primarily for middleware installations and domain configuration.
  • Developed GIT hooks for teh local repository, code commit and remote repository, code push functionality and worked on teh GIT-Hub.
  • Developed Airflow Workflow to schedule batch and real-time data from source to target.
  • Backing up AWS Postgres to S3 on daily job run on EMR using Data Frames.
  • Worked onETLProcessing which consists of data transformation, data sourcing and also mapping, Conversion and loading.
  • Used Pandas API to put teh data as time series and tabular form for east timestamp data manipulation and retrieval to handle time series data and do data manipulation.
  • Responsible for Technical architecture and creation of technical specs & designing ofETLprocesses like mapping, source, target and staging databases.
  • Knowledge on cloud based DAG and Apache Air Flow.
  • Exploring DAG's, their dependencies and logs using Air Flow pipelines for automation

Environment: Python2.7&3.7, AWS Lambda, SNS, SQS, EMR, Ec2, Cloud watch, RDS, Spark, Linux, Shell Scripting, GitHub, Jira

AWS/Python Developer

Confidential, Malvern, PA

Responsibilities:

  • Wrote scripts inPythonfor automation of testing Framework jobs.
  • Used Multi-Threading factory model to distribute learning process back-testing and teh into various worker processes.
  • Data loaded into s3 using Multi-Threading process into single file using Python Code also developed Wrapper inPythonfor instantiating multi-threaded application.
  • Implemented configuration changes for data models.
  • Maintained and updated existing automated solutions.
  • Handled potential points of failure through error handling and communication of failure.
  • Used Amazon EC2 command line interface along wif Bash/Pythonto automate repetitive work
  • CreatedPythonscript to monitor server load performance in production environment and horizontally scale teh servers by deploying new instances.
  • Added support for Amazon AWS S3 and RDS to host static/media files and teh database into Amazon Cloud.
  • Lock mechanisms were implemented and teh functionality of multithreading has been used.
  • Python/Django based web application, PostgreSQL DB, and integrations wif 3rd party email, messaging, storage services.
  • Used Pandas API to put teh data as time series and tabular format for east timestamp data manipulation and retrieval.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed inPythonand Bash.
  • Developed and delivered presentations on logical data modeling, application process modeling and metadata management for business units
  • Used object-relational mapper (ORM) to automate teh transfer of data stored in relational databases tables into objects.
  • Successfully migrated teh database from SQLite to MySQL to PostgreSQL wif complete data Integrity.
  • Written Shell scripting to spin up clusters like EMR, EC2, and amazon RDS.
  • Anticipated potential parts of failure (database, communication points, file system errors).
  • Troubleshoot teh process execution and worked wif other team members to correct them.
  • Actively worked as a part of team wif managers and other staff to meet teh goals of teh project in teh stipulated time.
  • Performed troubleshooting, fixed and deployed manyPythonbug fixes of teh two main applications dat were a main source of data for both customers and internal customer service team.
  • Used Pandas library for statistics Analysis.
  • Used Numpy for Numerical analysis for Insurance premium.
  • Used Psycopg2 to make connection to RDS, which written in python.
  • Large Data was optimized using chunk-size method to load to S3
  • Managed large datasets using Pandas data frames and MySQL.
  • Extensively usedpythonmodules such as requests, urllib and urllib2 for web crawling.
  • Invoked Lambda through Cloud Watch for triggering it.
  • Worked on deployment on AWS EC2 instance wif Postgres RDS and S3 file storage
  • Developed teh required XML Schema documents and implemented teh framework for parsing XML documents.

Environment: Python2.7, Django, HTML5/CSS, Postgres, AWS EMR, EC2, SNS, SQS, MySQL, JavaScript, Eclipse, Linux, Shell Scripting, JQuery, Github, Jira, PySpark, Bootstrap, Jenkins

AWS/ Python Developer

Confidential, St. Louis, MO

Responsibilities:

  • Booting up nodes using prebuilt images on Amazon EC2. Uploading, Copying, Downloading, and Deleting files using Amazon S3.
  • Worked on loading CSV/TXT/DAT files using Scala language in Spark Framework to process teh data by creating Spark Data frame and RDD and save teh file in parquet format in HDFS to load into fact table using ORC Reader.
  • Worked on various applications usingPythonintegrated IDEs Eclipse, PyCharm, and Net Beans.
  • Designed and developed an entire module called CDC (change data capture) inpythonand deployed in AWS GLUE using PySpark library andpython
  • Built database Model, Views and API's usingPythonfor interactive web based solutions.
  • UsedPythonscripts to update teh content in database and manipulate files.
  • Wrote and executed several complex SQL queries in AWS glue forETLoperations in Spark data frame using SparkSQL.
  • Automated most of teh daily task usingpythonscripting.
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,Pythonand Scala.
  • As point on teh L2 support team worked wif application, database, reporting, SQM and project managements team to resolve
  • Responsible for user validations on client side as well as server side.
  • Automated teh existing scripts for performance calculations using Numpy and SQL alchemy.
  • Interacted wif QA to develop test plans from high-level design documentation.
  • Added support for Amazon AWS S3 and RDS to host static/media files and teh database into Amazon Cloud.
  • Implemented Restful Web-Services for sending and receiving teh data between multiple systems.
  • We used to send text data through us sockets wif most API’s and JSON.
  • Responsible for debugging and troubleshooting teh web application.
  • Developed, tested and debugged software tools utilized by clients and internal customers.
  • Coded test programs and evaluated existing engineering processes.

Environment: Linux,Python2.7, 2.5, Django 1.7, 1.5, HTML5, XML, JavaScript,, MS SQL Server, NoSQL, Amazon s3, Jenkins, Git, Github, JIRA, AWS Services.

Software developer

Confidential

Responsibilities:

  • Implemented user interface guidelines and standards throughout teh development and maintenance of teh website using teh HTML, CSS, JavaScript and JQuery.
  • Contributed to development of security policies and processes.
  • Developed views and templates wifPythonview controller and templating language to create a user-friendly website interface.
  • RefactorPython modules to deliver certain format of data.
  • Development ofPythonAPIs to dump teh array structures in teh Processor at teh failure point for debugging.
  • Involved in development of Web Services using SOAP for sending and getting data from teh external interface in teh XML format.
  • Creating restful web services for Catalog and Pricing wif MySQL, NoSQL and Mongo DB.
  • Interact wif backend using Java and Hibernate Framework.
  • Involved in Coding of Enterprise Java Beans, which implements business rules, and business logic.
  • Involved in developing teh Java classes and JavaBeans.
  • Representation of teh system in hierarchy form by defining teh components, subcomponents usingPythonand developed set of library functions over teh system based on teh user needs.
  • Used Selenium Library to write fully functioning test automation process dat allowed teh simulation of submitting different requests from multiple browsers to web application.
  • Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.
  • Must have strong experience in Python to support Q-Direct and major part of Q-Direct code is on Python.
  • UsedPythoncreating graphics, XML processing, data exchange and business logic implementation.
  • Utilize in-depth noledge of Technical experience in LAMP and other leading-edge products and technology in conjunction wif industry and business skills to deliver solutions to customer.
  • Developed multiple spark batch jobs in Scala using Spark SQL and performed transformations using many APIs and update master data in Cassandra database as per teh business requirement.
  • Written Spark-Scala scripts, by creating multiple udf's, spark context, Cassandra sql context, multiple API's, methods which support data frames, RDD's, data frame Joins, Cassandra table joins and finally write/save teh data frames/RDD's to Cassandra database.
  • As part of teh POC migrated teh data from source systems to another environment using Spark, SparkSQL.
  • Developed and implemented core API services usingPythonwif spark.
  • Experienced in Export Transform Load.
  • Representation of teh system in hierarchy form by defining teh components, subcomponents usingPythonand developed set of library functions over teh system based on teh user needs.

Environment: Linux,Python2.7, 2.5, HTML5, XML, JavaScript, JQuery, MS SQL Server, NoSQL, Jenkins, Mongo DB, Beautiful soup, Eclipse, Git, Github, JIRA.

Software developer

Confidential

Responsibilities:

  • Translated teh customer requirements into design specifications and ensured dat teh requirements translate into software solution.
  • Application was based on service oriented architecture and usedPython2.5, Ajax, HTML, CSS for teh frontend.
  • Involved in creating Servlets and Java Server Pages (JSP), which route submittals to teh appropriate Enterprise Java Bean (EJB) components and render retrieved information using Session Facade.
  • Designed teh front end of teh application usingPython, HTML, CSS, AJAX, JSON and JQuery. Worked on backend of teh application, mainly using Active Records.
  • Involved in teh design, development and testing phases of application using AGILE methodology.
  • Developed and designed an API (Restful Web Service).
  • Used thePythonlanguage to develop web-based data retrieval systems.
  • Designed and maintained databases usingPythonand developedPythonbased API (Restful Web Service) using Flask, SQLAlchemy and PostgreSQL.
  • Developed web sites using Python, XHTML, CSS, and JavaScript.
  • Developed and designed e-mail marketing campaigns using HTML and CSS.
  • Tested and implemented applications built usingPython.
  • Developed and tested many features for dashboard usingPython, ROBOT framework, Bootstrap, CSS, and JavaScript.
  • Created complex dynamic HTML UI using jQuery.
  • Automated Regression analysis for determining fund returns based on index returns (Python/Excel).
  • Worked on development of SQL and stored procedures, trigger and function on MYSQL,
  • NoSQL.
  • Developed shopping cart for Library and integrated web services to access teh payment.
  • Used PHP language on lamp server to develop page.
  • Developed server based web traffic statistical analysis tool using Flask, Pandas.
  • Implemented and testedpythonbased web applications interacting wif MySQL.
  • Developed dynamic interaction page on .net MSvisualbasic-2014, using SQLdevelopertools.
  • Simple web app for reviewing sitcoms dat gives users teh ability to view, add, review, up/down vote, search, etc.
  • Performed joins, group by and other operations in Map Reduce usingPython.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Wrote Map Reduce code to make un-structured data into semi- structured data and loaded into Hive tables.
  • Supported Map Reduce Programs those are running on teh cluster.
  • Executed queries using Hive and developed Map Reduce jobs to analyze data.
  • Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily agile (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables wifin time.

Environment: Linux,Python2.7, ModPython, Perl, PHP, MySQL, NoSQL, JavaScript, Ajax, Shell Script, HTML, CSS.

We'd love your feedback!