We provide IT Staff Augmentation Services!

Application Developer/ Data Engineer Resume

2.00/5 (Submit Your Rating)

Phoenix, AZ

PROFESSIONAL SUMMARY:

  • Around 5+ years of experience as Application Developer and coding with analytical programming using Python, PySpark, Django, Flask, AWS, GCP, SQL.
  • Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
  • Well knowledge and experience in Cloudera ecosystem (HDFS, YARN, Hive, SQOOP, FLUME, HBASE, Oozie, Kafka, Pig), Data pipeline, data analysis and processing with hive SQL, IMPALA, SPARK, SPARK SQL.
  • Using Flume, Kafka and Spark streaming to ingest real time or near real time data in HDFS.
  • Analyzed data and provided insights with R Programming and Python Pandas
  • Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
  • Good experience of software development in Python (libraries used: Beautiful Soup, Numpy, SciPy, Maplotlib, Pandas data frame, network, urllib2, MySQL dB for database connectivity) and IDEs - sublime text, Spyder, PyCharm.
  • Expertise in AWS Resources like EC2, S3, EMR, Athena, RedShift, Glue VPC, ELB, AMI, SNS, RDS, IAM, Route 53, Auto scaling, Cloud Formation, Cloud Watch, API Gateway, Kinesis.
  • Working experience with cloud infrastructure of AWS (Amazon Web Services) and computing AMI virtual machines on Elastic Compute Cloud (EC2).
  • Have experience with AWS LAMBDA which runs the code with response of events.
  • Experienced with JSON based RESTful web services, and XML/QML based SOAP web services and also worked on various applications using python integrated IDEs like Sublime Text and PyCharm.
  • Worked with testing frameworks like unit test, Pytest and Bazel.
  • Using Django evolution and manual SQL modifications, able to modify Django models while retaining all data, while site was in production mode.
  • Experienced in NoSQL technologies like MongoDB, Cassandra, and relational databases like Oracle, SQLite, PostgreSQL and MySQL databases
  • Developed Cloud Formation templates, also launched AWS Elastic Beanstalk for deploying, monitoring and scaling web applications using different platforms like Docker, Python etc.
  • Extensively worked with automation tools like Jenkins, Artifactory, Sonarqube Chef and Puppet for continuous integration and continuous delivery (CI/CD) and to implement the End-to-End Automation.
  • Have good experience on working with version controls like Git, GitHub and AWS CodeCommit
  • Experience in using Tomcat apache servers and Docker containers for deployment.
  • Good idea about testing tools like Bugzilla and JIRA
  • Hands on Experience in Data mining and Data warehousing using ETL Tools and Proficient in Building reports and dashboards in Tableau (BI Tool).

EXPERIENCE:

Confidential, Phoenix, AZ

Application Developer/ Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Design, development and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark on AWS.
  • Developed Spark and MapReduce jobs to parse the JSON and XML data.
  • Integration of data storage solutions in spark - especially with AWS S3 object storage.
  • Performance tuning of existing pySpark scripts
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Imported data from AWS S3 into Spark RDDs to perform transformations and actions on those RDDs.
  • Used Spark sql to load data and create schema RDD and handle the structured data.
  • Worked with various file formats like Avro, Parquet, snappy, etc.
  • Experience with Cloud Technologies: AWS (Lambda, S3, cfts, cloud watch rules, Redshift, EC2, EBS, IAM, API Gateway, cloud formation), Snowflake.
  • Integrated services like GitHub, AWS Code Pipeline, Jenkins and AWS Elastic Beanstalk to create a deployment pipeline.
  • Created scripts in Python (Boto3) which integrated with Amazon API to control instance operations.
  • Designed, built and coordinate an automated build & release CI/CD process using Gitlab, Jenkins and Puppet on hybrid IT infrastructure.

Confidential, Detroit, MI

Python Developer/ Data Engineer

Responsibilities:

  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
  • Design and architect various layer of Data lake.
  • Design star schema in Big Query
  • Analyzing client data using Scala, spark, spark SQL, define an end to end data lake presentation towards the team
  • Design transformation layers to write the ETL using Scala and spark and distribute among the team
  • Keep the team motivated to deliver the project on time and work side by side with other members as a team member
  • Design and develop spark job with Scala to implement end to end data pipeline for batch processing
  • Loading salesforce Data every 15 min on incremental basis to BIGQUERY raw and UDM layer using SOQL, Google DataProc, GCS bucket, HIVE, Spark, Scala, Python, Gsutil And Shell Script.
  • Using rest API with Python to ingest Data from and some other site to BIGQUERY.
  • Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Bigquery tables.
  • Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, Bigquery and load it in Bigquery.
  • Monitoring Bigquery, Dataproc and cloud Data flow jobs via Stackdriver for all the environment.
  • Open SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.
  • Submit spark jobs using gsutil and spark submission get it executed in Dataproc cluster
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Analyze various type of raw file like Json, Csv, Xml with Python using Pandas, Numpy etc.
  • Write Scala program for spark transformation in Dataproc.
  • Using g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket.
  • Write a program to download a SQL Dump from there equipment maintenance site and then load it in GCS bucket. On the other side load this SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load the Data from MYSQL to Bigquery using Python, Scala, spark and Dataproc.
  • Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python.
  • Create firewall rules to access Google Data proc from other machines.
  • Write Scala program for spark transformation in Dataproc.

Confidential, Piscataway, NJ

Python Developer

Responsibilities:

  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Manage datasets using Pandas data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
  • Develop Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana etc.
  • Involved in the Web/Application development using Python 3.5, HTML5, CSS3, AJAX, JSON and Jquery.
  • Develop and tested many features for dashboard using Python, Java, Bootstrap, CSS, JavaScript and Jquery.
  • Generated Python Django forms to record data of online users and used PyTest for writing test cases.
  • Implemented and modified various SQL queries and Functions, Cursors and Triggers.
  • Used Pandas for time series and tabular format for manipulation and retrieval of data.
  • Analyze Format data using Machine Learning algorithm by Python Scikit-Learn.
  • Perform troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
  • Automated different workflows, which are initiated manually with Python scripts and Unix shell scripting.
  • Use Python unit and functional testing modules such as unit test, unittest2, mock, and custom frameworks in-line with Agile Software Development methodologies.
  • Develop Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
  • Write Python scripts to parse JSON documents and load the data in database.
  • Generating various capacity planning reports (graphical) using Python packages like Numpy, matplotlib.
  • Analyzing various logs that are been generating and predicting/forecasting next occurrence of event with various Python libraries.
  • Developed single page application by using Angular JS backed by MongoDB and NodeJS.
  • Design and maintain databases using Python and developed Python based API (RESTful Web Service) using Flask, SQL Alchemy and PostgreSQL.
  • Expanded website functionality, using Flask framework in Python to control the web application logic
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Use Celery as task queue and RabbitMQ,Redis as messaging broker to execute asynchronous tasks.
  • Manage API system deployment using fast http server and Amazon AWS architecture.
  • Developed common ETL and written Python code to format XML documents which can help to source data from different platforms.
  • Wrote UNIX shell scripting for automation.
  • Developed views and templates with Django view controller and template Language to create a user-friendly website interface.
  • Create Individual Docker file for deployment for DevOps Teams whenever appropriate changes made.
  • Develop consumer based features using Django, HTML and Test Driven Development (TDD).
  • Develop scripts to automate the execution of ETL using python scripts under Unix environment.
  • Developed Python web services for processing JSON and interfacing with the Data layer.
  • Increased the speed of pre-existing search indexes through Django ORM optimizations.
  • Developed module to build Django ORM queries that can pre-load data to greatly reduce the number of databases queries needed to retrieve the same amount of data.
  • Develop remote integration with third party platforms by using RESTful web services and Successful implementation of Apache Spark and Spark Streaming applications for large scale data.
  • Built various graphs for business decision making using Python mat plotlib library.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
  • Improved internally developed ETL tools to make them more robust and flexible.

Confidential

Software Developer

Responsibilities:

  • Analyzing the source data coming from various heterogeneous data sources to be consumed in the ETL to load data into data warehouse.
  • Created client and server actions and added permissions for admins and non-admin users by restricting individuals for a particular data set by using flask principal, permissions and needs.
  • Developed an API that asynchronously distributes task using RabbitMQ and Celery
  • Porting of data import jobs from cron jobs to distributed tasks, leading to a speedup.
  • Efficiently performed all backend tasks from OPS up to the REST API interface/Portal frontend.
  • Deployed async jobs monitoring system using celery flower.
  • Wrote unittests and did code reviews.
  • Worked with search business and search team to implement dynamic rule updates to search using AWS Elasticsearch.
  • Created mapreduce job using python for creating sync between PTC configs and PTCs to remove unwanted attributes for products in .com.
  • Used basic Hive queries for processing large sets of data used for analyzing the 1P, 2P and 3P products and also for analyzing data from MP Sellers, Sellers and Suppliers.
  • Developed Spark code using python for faster processing of data given by market place sellers for generating best specification and description of products
  • Wrote Python normalizations scripts to find duplicate data in different environments.
  • Good Knowledge on MongoDB Workspaces, Snapshots and patching documents in Snapshots.
  • Wrote scripts to integrate API's with 3rd party applications.
  • Wrote scripts to Import and Export data to CSV, EXCEL formats from different environments using Python and made a Celery action using REST API call.
  • Performed data validation and data cleaning process and data manipulation with pandas and numpy used for data visualization by reporting teams to genereate ranking for content provided by sellers.
  • Did data analysis, did miss value imputation with statistical methodologies using pandas, numpy.
  • Worked under Agile/Scrum environment and handled production rollouts and issues.
  • Extensively used ETL to load data from Oracle and Flat files to Data Warehouse.
  • Developed new and enhanced search features such as SYNONYM, CANONICAL and ABBREVIATION for optimizing search results and relevancy. (JSON-elasticsearch-Kibana)
  • Extensively used XLSX reader and writer modules to read, write and analyze data and project the results as per the client request.
  • Used GIT and JENKINS for continuous integration and deployment.

We'd love your feedback!