Senior Data Engineer / Python Developer Resume
2.00/5 (Submit Your Rating)
SUMMARY
- Around 7 years of experience in IT Industry involving in analytical programming software development and design, testing of web based and client server multi - tier applications using Python, Django & Backend Technologies.
- Experience in Implementation of MVC/MVW architecture using Servlet, Django, JavaScript, jQuery, Nodejs, and RESTful.
- Experience in developing REST web services supporting JSON to integrate with other external applications and 3rd party systems.
- Strong Web Technology and Scripting languages experience with HTML5/4, CSS3/CSS, JSP, AJAX, JavaScript, jQuery, Bootstrap, NodeJS.
- Hands-on experience in creating Angular factories for using angular services like http and resource to make RESTful API calls to the Java based backend.
- Proficient experience on DevOps essential tools like Subversion (SVN), Chef, Puppet, Ansible, Jenkins, Docker, Kubernetes, Terraform, Git, GitHub, Subversion, Tomcat, Nginx, WebSphere, WebLogic and JBoss. Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.
- Experience in using build/deploy tools such as Jenkins, Docker and OpenShift for Continuous Integration & Deployment for Microservices.
- Strong knowledge and experience on AWS, specifically in Lambda, IAM, API Gateway, Dynamo DB, S3, Cloud Front, VPC, EC2.
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, Map Reduce, Hive.
- Knowledge of puppet as Configuration management tool, to automate repetitive tasks, quickly deploy critical applications, and proactively manage change.
- Experience in Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
- Experience in Web development with Python and Django, with good understanding of ORM and SQLAlchemy, SQL, ETL, Bash/Linux, Asynchronous task queues with Celery and RabbitMQ.
- Package Software Expertise in Informatica ETL and reporting tools. Deep understanding of the Data Warehousing SDLC and architecture of ETL, reporting and BI tools.
- Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Recommending Dashboards per Tableau visualization features and delivering reports to Business team on timely manner.
- Successfully upgraded Tableau platforms in clustered environment and performed content Upgrades.
- Established Best Practices for Enterprise Tableau Environment and application intake and Development processes.
- Designing, Developing and supporting interactive Tableau Dashboard reports.
- Experience in Marks, Publisher, Security concepts, creating Worksheets and Dashboard using Tableau.
- Good knowledge in developing the required XML Schema documents and implemented the framework for parsing XML documents.
- Familiar with JSON based REST Web services and Confidential Web services ( AWS) and Responsible for setting up Python REST API framework and spring frame work using Django
- Good experience in developing web applications implementing MVT architecture using Django, Flask, web application frameworks with good understanding of Django ORM and SQL Alchemy.
- Extensively worked with automation tools like Jenkins for continuous integration and continuous delivery (CI/CD) and to implement the End-to-End Automation
- Designed, developed, implemented, and maintained solutions for using Docker, Jenkins, Git, and Puppet for Microservices and continuous deployment.
- Experience in the field of Big Data, Machine Learning, Statistical Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Mining, Text Mining, Natural Language Processing (NLP) and Business Intelligence.
- Proficiency in and understanding of statistical and other tools/languages like R, Python, C, C++, Java, SQL, data visualization tools and Anaplan forecasting tool.
- Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging and Machine Learning.
- Hands on experience in Designing and developing API's for the application using Python, Django, MongoDB, Express, ReactJS, and NodeJS.
- Extensively Worked on Star Schema, Snowflake Schema, Data Modeling, Logical and Physical Model, Data Elements, Issue/Question Resolution Logs, and Source to Target Mappings, Interface Matrix and Design elements.
- Expert in creating, configuring and fine-tuning ETL workflows designed in DTS and MS SQL Server Integration Services (SSIS).
- Expertise in working with different databases like Cassandra, Oracle, MySQL, PostgreSQL and Good knowledge in using NoSQL database MongoDB.
- Experience in installation, configuration, supporting and managing Hadoop Clusters. Building highly scalable Big-data solutions using Hadoop and other multiple distributions i.e. Cloudera, Hortonworks and NoSQL platforms (HBase & Cassandra).
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Good Knowledge in Confidential AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Experience in developing and automating application's using Unix Shell Scripting in the field of Big Data using Map-Reduce Programming for batch processing of jobs on a HDFS cluster, Hive and Pig.
- Performed Unit testing, Integration Testing and generating of test cases for web applications using Junit and Python Unit test framework Hudson/Jenkins in the builds triggered by each push to GIT.
- Experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.
TECHNICAL SKILLS
- Python 2.x/3.x, Go lang, Scala, SQL, PL/SQL, SAS, PEP8, PIP,Spark, Requests, Scrapy, SQLAlchemy, BeautifulSoup, NumPy, SciPy, matplotlib, PyGame, Pyglet, PyQT, PyGtk, pywin32, ntlk, nose, OpenCV, SymPy, Ipython, Caffe, Torch, TensorFlow, Django, Flask, Pyramid, Twisted, Muffin, CherryPy, TastyPie, Pyjamas, gui2py, PySide
- TkInter, PyForms, CVS, Git, Mercurial, SVN, GitHub, Jenkins, Chef, Puppet, Ansible, Docker, Kubernetes, PyUnit, PyTest, PyMock, Mocker, Antiparser, webunit, webtest, Selenium, Splinter, PyChecker, Komodo, PyCharm, PyDev, PyScripter, PyShield, Spyder, Jupyter, MySQL, Teradata, SQL Server, InfluxDB, MongoDB, IntelliJ, Cassandra, PostgreSQL
- Splunk, Bugzilla, Jira, HP ALM, HP Quality Center, Software Development Life Cycle (SDLC), Agile, Waterfall, Hybrid, TDD, XP, BDD, EDD, Pair Programming, Scrum, ELK (Elasticsearch, Logstash, Kibana), Solr, Kanban, Kafka, Swagger, OpenStack, Confidential Web Services (AWS), Microsoft Azure, Boto3,Jinja, Mako, AMQP, Celery
- Apache Tomcat, RabbitMQ, Celery, Heroku, Samba, Confluence, Bamboo, AJAX, jQuery, JSON, XML, XSLT, LDAP, OAuth, SOAP, REST, Microservices, Active Directory, design patterns, HTML/HTML5, CSS/CSS3, JavaScript, PhosphorJS,AngularJS, NodeJS, EmberJS, ReactJS, Bootstrap, Big Data and Hadoop technologies
- Linux, Unix. SRDBMS: Oracle, Teradata, DB2, SQL Server 2012/2008, MySQL, MS Access NoSQL: MongoDB, HBase, Cassandra ETL Tools: Informatica Power Center / Power Exchange, Informatica Data Quality (IDQ), Talend, Apache NiFi Languages: Java EE, Python, JavaScript SQL: ANSI SQL, T-SQL, PL/SQL, BTE Scripting: UNIX Shell Scripting, PERL, Windows Batch Script, VB, PowerShell, YAML Versioning\ CI-CD: SVN, GIT, SourceTree, Bitbucket, Bamboo, JIRA, Confluence IDE: SQL*Plus, SQL Developer, TOAD, SQL Navigator, Query Analyzer
- SQL Server Management Studio, SQL Assistance, Eclipse, Postman Analytics/BI: Microstrategy, SPSS, IBM Cognos, OBIEE, Business Objects Other: HTML, CSS, JQuery, Thymeleaf, XML, JSON, PHP, MS Visio, Erwin, Confluence, Spring Framework, SQL*Loader, Tidal, Oozie, SAS, R
PROFESSIONAL EXPERIENCE
Senior Data Engineer / Python Developer
Confidential
Responsibilities:
- Created a Python/ Django based web application using Python scripting for data processing, MySQL for the database, and HTML/CSS/jQuery and High Charts for data visualization of the served pages
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created Angular service and integration with RESTful APIs. Used Angular http to send GET/POST request to fetch data from server. Used open source library like Angular material to create customized components.
- Utilized AngularJS UI-Route for managing the State Transition and URL route. Utilized AngularJS dependency injection to inject different services just like custom service, and create some custom directives to do some reusable component.
- Created Jenkins build and deployment pipeline jobs to deploy the docker images into AWS ECR repositories and integrated with GITHUB.
- Involved in Design, analysis, Implementation, Testing and support of ETL processes for Stage, ODS and Mart.
- Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart. Used Informatica debugger to test the data flow and fix the mappings.
- Worked with Informatica Data Quality (IDQ) toolkit, Analysis, data cleansing, data matching, data conversion, address standardization, exception handling, and reporting and monitoring capabilities of IDQ.
- Worked on profiling the source data to understand and perform Data quality checks using Informatica Data Quality and load the cleansed data to landing tables.
- Worked with Informatica Data Quality Developer/Analyst Tools to remove the noises of data using different transformations like Standardization, Merge and Match, Case Conversion, Consolidation, Parser, Labeler, Address Validation, Key Generator, Lookup, Decision etc.
- Defined best practices for Tableau report development.
- Implemented Tableau BI Dashboard reporting solution for different groups in the organization.
- Published Dashboards onto Tableau Server and from there consumers would choose viewing medium (laptop, pc, IPad)
- Developed Microservices by creating REST APIs and used them to access data from different suppliers and to gather network traffic data from servers. Used Jenkins pipelines to drive all microservices builds out to the Docker registry.
- Used Ansible to manage systems configuration to facilitate interoperability between existing infrastructure and new infrastructure in alternate physical data centers or cloud (AWS)
- Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora.
- Automated AWS volumes snapshot backups for enterprise using Lambda. Created functions and assigned roles in AWS Lambda to run python scripts. Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
- Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation from Dev to Eval, Eval to Pre Prod/ Pre Prod to Production systems using Jenkins, GIT, SVN, Chef automation tool.
- Consumed external APIs and wrote RESTful API using Django REST Framework and Angular. Equally comfortable working within the Django ORM or writing native SQL in SQL Server.
- Developed a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL and custom tools developed in Python and Bash.
- Worked on Mongo DB database concepts such as locking, transactions, indexes, Sharding, replication, schema design.
- Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers. Developed microservice on boarding tools leveraging Python and Jenkins allowing for easy creation and maintenance of build jobs and Kubernetes deploy and services.
- Used EC2 Container Service (ECS) to support Docker containers to easily run applications on a managed cluster of Confidential EC2 instances.
- Worked on creating the Docker containers and Docker consoles for managing the application life cycle. Setup Docker on Linux and configured Jenkins to run under Docker host
- Documented company Restful API's using Swagger for internal and third part use and also worked on Unit testing and Integration testing.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend etc.
- Developed PySpark code using Python and Spark-SQL for faster testing and data processing. Analyzed the SQL scripts and designed the solution to implement using PySpark .
- Developed data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load into target data destinations. Utilized pandas, NumPy, SciPy, and NLTK libraries to create unique reports and analysis.
- Designed the ETL Solution and created High Level and Detailed design document based on technical requirements document.
- Involved in Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with Shell script.
- Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases, and process SSAS cubes to store data to OLAP databases
- Designed AWS Lambda functions in Python an enabler for triggering the shell script to ingest the data into Mongo DB and exporting data from Mongo DB to consumers.
- Worked on Mongo DB write concern to avoid loss of data during system failures and implemented read preferences in Mongo DB replica set.
- Experienced in creating data pipeline integrating kafka with spark streaming application used for writing applications.
- Worked with container-based deployments using Docker, working with Docker images, Docker Hub and Docker registries and Kubernetes.
- Used spark SQL for reading data from external sources and processes the data using Scala computation framework. Used the Spark Cassandra Connector to load data to and from Cassandra.
- Designed tables and columns in Redshift for data distribution across data nodes in the cluster keeping columnar database design considerations.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive. Configured Hadoop clusters and coordinated with Big data Admins for cluster maintenance
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Implemented usage of Confidential EMR for processing Big Data across a Hadoop cluster of virtual servers on Confidential Elastic Compute Cloud (EC2) and Confidential Simple Storage Service (S3).
- Involved in development of Web Services using REST for sending and getting data from the external interface in the JSON format.
Data Engineer / Python Engineer
Confidential - Minnetonka, Minnesota
Responsibilities:
- Created Python/Django based web application, PostgreSQL DB and integrations with 3rd party email, messaging, storage services.
- Developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
- Used Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Design and develop logical and physical data models that utilize concepts such as Star Schema, Snowflake Schema and Slowly Changing Dimensions.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Oracle and Informatica PowerCenter.
- Created common reusable objects for the ETL team and overlook coding standards.
- Reviewed high-level design specification, ETL coding and mapping standards.
- Designed new database tables to meet business information needs. Designed Mapping document, which is a guideline to ETL Coding.
- Worked on Large-scale Hadoop YARN cluster for distributed data processing and analysis using Sqoop, Pig, Hive, Impala and NoSQL databases. Develop Hadoop data processes using Hive and/or Impala.
- Zookeeper, and Accumulate stack, aiding in the development of specialized indexes for performant queries on big data implementations.
- Worked on deploying Hadoop cluster with multiple nodes and different big data analytic tools including Pig, Hbase database and Sqoop . Got good experience with NoSQL database.
- Used Informatica B2B data exchange to handle EDI (Electronic Data Exchange) for handling the payments for the scheduled dates.
- Worked with Informatica Cloud to create Source /Target connections, monitor, and synchronize the data with salesforce.
- Worked with Informatica cloud for creating source and target objects, developed source to target mappings.
- Experience with Informatica BDE related work on HDFS, Hive, Oozie, Spark and sqoop.
- Created Rich dashboards using Tableau Desktop and prepared user stories to create compelling dashboards to deliver actionable insights.
- Responsible for interaction with business stake holders, gathering requirements and managing the delivery.
- Connected Tableau server to publish dashboard to a central location.
- Used different features of Tableau to create drill-down, filter and interactivity based on user requirement Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries.
- Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
- Worked towards creating real time data streaming solutions using Apache Spark/ Spark Streaming, Kafka .
- Analyzed the SQL scripts and designed solutions to implement using PySpark & utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build and test deploy.
- Involved in Unit Testing and Preparing and using Test data/cases to verify accuracy and completeness of ETL process.
- Worked on enhancing the ETL packages & deploying them to the server from development to production environment.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Launched AWS EMR Instance using Confidential Web Services using spark application, implemented Spark using PySpark and Spark SQL for faster processing of data.
- Created Data tables utilizing PySpark, PyQT to display customer and policy information and add, delete, update customer records.
- Wrote Restful web services to communicate with Mongo DB and performed CRUD operations on Mongo DB using Restful web API services.
- Involved in design, implementation and modifying the Python code and MySQL database schema on-the back end.
- Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
- Developed the application RESTful API server using Django REST Framework, designing and developing various endpoints, defining Models, Serializers, View Sets and register corresponding URLs to the endpoints using DRF Routers.
- Developed views and templates with Python and Django view controller and templating language to create a user-friendly website interface.
- Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval. Involved in writing API for Confidential Lambda to manage some of the AWS services.
- Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Used Confidential Elastic Beanstalk with Confidential EC2 instance to deploy Django project into AWS. Configured continuous integration with Jenkins on Confidential EC2.
- Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda to perform event driven processing.
- Implemented AWS Code Pipeline and Created Cloud formation JSON templates in Terraform for infrastructure as code. Write terraform scripts for Cloudwatch Alerts.
- Worked on puppet as Configuration management tool, to automate repetitive tasks, quickly deploy critical applications, and proactively manage change.
- Worked on creating various types of indexes on different collections to get good performance in Mongo database.
- Used Jenkins pipelines to drive all micro services builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes. Utilized Kubernetes for the runtime environment of the CI/CD system to build, test deploy.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
- Extracted data using T-SQL in SQL server to write Queries, Stored procedures, Triggers, Views, Temp Tables and User-Defined Functions (UDFs).
- Worked with container-based deployments using Docker, working with Docker images, Docker Hub and Docker registries and Kubernetes
- Worked on creating the Docker containers and Docker consoles for managing the application life cycle. Setup Docker on Linux and configured Jenkins to run under Docker host.
Python Developer
Confidential
Responsibilities:
- Used Python to write data into JSON files for testing Django Websites. Created scripts for data modelling and data import and export.
- Connected continuous integration system with GIT version control repository and continually build as the check-in's come from the developer.
- Build all database mapping classes using Django models and Cassandra. Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
- Implemented AWS high-availability using AWS Elastic Load Balancing (ELB), which performed balance across instances.
- Created Terraform scripts for EC2 instances, Elastic Load balancers and S3 buckets. Implemented Terraform to manage the AWS infrastructure and managed servers using configuration management tools like Chef and Ansible.
- Worked with container-based deployments using Docker, working with Docker images, Docker Hub and Docker registries and Kubernetes.
- Built Jenkins pipeline to drive all microservices builds out to the Docker registry and then deployed to Kubernetes.
- Involved in the CI/CD pipeline management for managing the weekly releases. Worked on packages like socket, REST API, Django.
- Deployed project into Confidential web services (AWS) using Confidential elastic bean stalk. Developed entire frontend and backend modules using Python on Django including Tasty pie Web Framework using Git.
- Worked in MySQL database on simple queries and writing Stored Procedures for normalization & deployed the project into Jenkins using GIT version control system.
- Configured and maintained Jenkins to implement the CI process and integrated the tool with Ant and Maven to schedule the builds.
- Develop Unix Shell scripts to perform ELT operations on big data using functions like Sqoop, create external/internal Hive tables, initiate HQL scripts etc.
- Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend etc.
- Knowledge of push down optimization concepts and tuning Informatica objects for optimum execution timelines.
- Experienced with identifying Performance bottlenecks and fixing code for Optimization in Informatica and Oracle.
- Utilized all Tableau tools including Tableau Desktop, Tableau Public and Tableau Reader.
- Experience in Installation, Configuration and administration of Tableau Server in a multi - server and multi-tier environment.
- Extensive experience on building dashboards in Tableau and Involved in Performance tuning of reports and resolving issues within Tableau Server and Reports.
- Created and maintained various reporting dashboards to consolidate multiple views for internal and external purposes.
- Involved in back end development using Python with framework Django Restful web services using Python REST API Framework.
- Used Celery framework to develop a new feature to support parallel processes and complete multiple requests simultaneously. Designed Celery and multithreading for scheduling the tasks and multiple activities.
- Worked on complex mappings, mapplets and workflow to meet the business needs ensured they are reusable transformation to avoid duplications.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift. Used JSON schema to define table and column mapping from S3 data to Redshift
- Worked in various transformations like Lookup, Joiner, Sorter, Aggregator, Router, Rank and Source Qualifier to create complex mapping.
- Extensively used ETL to transfer and extract data from source files (Flat files and DB2) and load the data into the target database.
- Extensive SQL querying for Data Analysis and wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling. Extracted business rule and implemented business logic to extract and load SQL server using T-SQL.
- Created Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, grain, dimensions and measured facts.
- Optimized existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
- Created interactive data charts on web application using High charts JavaScript library with data coming from Apache Cassandra.
- Worked on Jenkins by installing, configuring and maintaining for Continuous integration (CI) and for End to End automation for all build and deployments.
- Developed the notification service by posting the JSON request in AWS API Gateway, Validated the response in Lambda by getting the data from Dynamo DB and sending the notification through AWS SNS.
- Implemented REST API's in Python using micro-framework like Flask with SQL Alchemy in the backend for management of data center resources on which OpenStack would be deployed.
- Wrote Python modules to view and connect the Apache Cassandra instance. Wrote Python script that analyzes Apache access logs and loads required data into Mongo DB collections.