Data Engineer/Architect Resume Wilmington, DE - Hire IT People

PROFESSIONAL SUMMARY:

6+ years of experience as a Web/Application Developer and coding with analytical programming using Python
Around 4 years of experience as a programmer developing applications and 4 years of experience in Big Data .
Experienced with full software development life - cycle, architecting scalable platforms, object oriented programming, database design and agile methodologies
Expert knowledge of and experience in Object oriented Design and Programming concepts.
Experience object oriented programming (OOP) concepts using Python, C++ and PHP.
Experienced in WAMP (Windows, Apache, MYSQL, and Python/PHP) and LAMP (Linux, Apache, MySQL, and Python/PHP) Architecture.
Good understanding of Hadoop, HDFS, MapReduce, Flume, Kafka, Hadoop Ecosystem (Pig, Hive, HBase) and spark.
Experience in leading multiple efforts to build Hadoop platforms, maximizing business value by combining data science with big data.
Experienced on major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager & Hortonworks.
Expertise in writing HIVE queries, Pig and MapReduce scripts and loading the huge data from local file system and HDFS to Hive.
Advised organizations about big data, a big data strategy, the implementation of big data, which technologies best fit the needs of the organization and even implements the selected big data solution
Experienced in developing web-based applications using Python, Django, PHP, C++, XML, CSS, HTML, DHTML, JavaScript and jQuery.
Experienced in installing, configuring, modifying, testing and deploying applications with Apache.
Well versed with design and development of presentation layer for web applications using technologies like HTML, CSS, and JavaScript.
Familiar with JSON based REST Web services and Amazon Web services.
Experienced in developing Web Services with Python programming language.
Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL database.
Experienced in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers.
Experienced in agile and waterfall methodologies with high quality deliverables delivered on-time.
Experience in utilizing SAS Procedures, Macros, and other SAS application for data extraction, data cleansing, data loading and reporting.
Maintained detailed documentation and architectural solutions in IT infrastructure and sales systems.
Very strong full life cycle application development experience.
Strong database design and programming skills in SQL Server 2012/2008/2005, SQL Stored Procedures, functions, triggers, Cursors, Indexing, importing/exporting data from varied data sources
Experience with continuous integration and automation using Jenkins.
Experience with Unit testing/ Test-driven Development (TDD), Load Testing.
Have the ability to understand complex systems and be in command of the details to provide solutions.
Ability to learn and adapt quickly to the emerging new technologies and paradigms.
Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
Practical experience with working on multiple environments like development, testing, production.
Hands-on experience in writing and reviewing requirements, architecture documents, test plans, design documents, quality analysis and audits.
Excellent analytical and problem solving skills and ability to work on own besides being a valuable and contributing team player.

TECHNICAL SKILLS:

OS Platforms: Linux/Unix, Windows-98/NT, MAC OSX

Hadoop distributions: Hortonworks and cloudera

Languages: Python 2.x/3.x,Java,Shell, C, C#

Databases: MySQL, SQL Server 2008, PostgreSQL, Oracle, Teradata

Web Technologies: AJAX, AWS EC Cloud, Amazon S3 JavaScript, HTML, XML

Versioning Tools: Git, SVN,STASH

Web servers: Apache, Nginix, Tomcat

Framework: Django,Flask

Other Tools: Putty, Super putty, GpAdminIII, JIRA, Visual Studio

WORK EXPERIENCE:

Confidential - Wilmington, DE

Data Engineer/Architect

Responsibilities:

Participated in developing marketing analytics platform and Hadoop data lake and took part in requirement discussions and BRD reviews.
Worked in bringing multiple data feeds to the Greenplum as part of marketing analytics platform on business requirement.
Persisted data using PostgreSQL as a relational database to store and retrieve.
Developed External writable and readable tables in Greenplum to access data from Hadoop data lake.
Developed shell scripts to perform end to end ETL flow for data.
Worked with bulk load API to pull the customer opportunities data from salesforce using SOQL.
Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries.
Worked in designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS.
Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
Transferred and loaded datasets from Hive tables to Greenplum using GPHDFS PULL.
Experienced working with salesforce.com sandbox environments.
Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
Hands on experience in running hive queries on spark shell.
Used python framework to ingest data to the Hadoop hive tables.
Used Test driven approach for developing the application and Implemented the unit tests before deployment.
Experienced in building API to download the data from Google Cloud and DCM (Double Click campaign manager)
Developed and updated the existing spark jobs written in Scala to leverage with business process.
Written SQL queries which solved many critical requirements by digging the required data.
Well versed with the Software Development Life Cycle and agile methodology.
Automated Jobs using control M scheduler.
Experienced in working with multiple databases like MYSQL, ORACLE, TERADATA, SQL and SYBASE as part of ETL process.
Performed Coding new programs, implementing new functionalities in existing programs.
Developed new application to support data feature engineering concepts using C sharp with spark as backend.
Strong hold on troubleshooting Spark issues.
Experienced using spark to ingest data from Teradata to Hadoop as Hive tables.

Environment: Cloudera Hadoop Distribution(CDH5.6),Hdfs, Hive,HUE,Impala,sqoop,Python 2.7, MySQL, MS SQL Server,Linux,Shell Scripting,Teradata,oracle,PostgresSQL,Greenplum,Spark,C #.

Confidential - Wausau, WI

Sr Python Developer with Hadoop

Resposibilities:

Participated to develop a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
Worked with team of Hadoop developers on maintaining the data platform applications for RISK management.
Worked in designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS.
Experienced in processing large datasets of different forms including structured, semi-structured and unstructured data.
Developed rest API’s using python with flask and django framework.
Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
Exposure to various mark-up languages including XML, JSON, CSV.
Good Understanding of Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, Resource Manager.
Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
Performed regular audits of business development and marketing data, providing guidance to users as necessary.
Analyzed financial and marketing data using Excel pivot tables.
Developed python scripts to load data to hive from HDFS.
Participated in developing ETL components for executing various workflows
Developed pig scripts and hive scripts for processing the data.
Handled the JSON, XML, Log data using Hive (SERDE), Pig and filter the data based on query factor.
Developed the data lake with Marketing analytics data.
Worked in agile methodology with 2 weeks sprints.
Scheduled Jobs using crontab, rundeck and control-m.
Performed Branching, Tagging, Release Activities on Version Control Tools: GIT and GITLAB.
Importing and exporting data job's, to perform operations like copying data from HDFS and to HDFS using Sqoop.
Worked with the marketing data team in retriving the data in batches for push notifications priority.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Data was Ingested which is received from various database providers using Sqoop onto HDFS for analysis and data processing.
Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive.
Managed the imported data form different data sources, performed transformation using Hive, Pig and Map- Reduce and loaded data in HDFS.
Executed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
To achieve Continuous Delivery goal on high scalable environment, used Docker coupled with load-balancing tool Nginx.
Developed Oozie workflow to run job onto data availability of transactions.
Created User Defined Functions (UDF’s) for maintaining Incremental ID’s.
Used Shell scripting to analyse the data from SQL Server source and processed it to store into HDFS.
Had good exposure to spark Mlib,Streaming and sql by closely working with data scientists.
Generated reports from Hive data using Microstrategy.
Worked with complex sql queries to make joins.
Increased the time efficiency of the HIVEQL and reduced the time difference of executing the sets of data by applying the compression techniques for Map-Reduce Jobs.
Created Hive Partitions for storing Data for Different Companies under Different Partitions.

Environment: Hadoop,Hive,sqoop,pig, Python 2.7/3,java, Django 1.4,Flask, XML, MySQL, MS SQL Server, Linux, Shell Scripting,mongodb,SQL.

Confidential - Jhonston,RI

Sr Python Developer

Responsibilities:

Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
Worked with team of developers on Python applications for RISK management.
Developed Python/Django application for Google Analytics aggregation and reporting.
Used Django configuration to manage URLs and application parameters.
Worked on Python Open stack API's.
Used Python scripts to update content in the database and manipulate files.
Generated Python Django Forms to record data of online users
Detailed Understanding on existing build system, Tools related for information of various products and releases and test results information.
Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
Configured ec2 instances and configured IAM users and roles.
Created s3 data pipe using Boto API to load data from internal data sources.
Configured Jboss cluster and MySQL database for application access.
Developed UDF's to provide custom hive and pig capabilities.
Built a mechanism for automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
Ingestion of data into Hadoop using Sqoop and apply data transformations and using Pig and HIVE.
Used Python and Django creating graphics, XML processing, data exchange and business logic implementation
Developed Pl-Sql store procedures to convert the data from Oracle to MongoDB.
I have used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
Automate report generation in MongoDB using Javascript, shell scripting, java
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Systems automation utilizing Control-M for scheduling and PowerShell/C# for script development.
Used Pandas library for statistical Analysis.
Developed tools using Python, Shell scripting, XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators and production to ensure production deadlines are met.
Worked very closely with designer, tightly integrating Flash into the CMS with the use of Flashvars stored in the Django models. Also created XML with Django to be used by the Flash.
Used HTML, CSS, jQuery, JSON and JavaScript for front end applications.
Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS and JavaScript.
Also used Bootstrap as a mechanism to manage and organize the html page layout.
Used Django configuration to manage URLs and application parameters.
Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
Worked on development of SQL and stored procedures on MYSQL.
Responsible for debugging the project monitored on JIRA (Agile).
Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.

Environment: Python, hive,oozie,Amazon AWS S3,MySQL,HTML, Python 2.7, Django 1.4, HTML5, CSS, XML, MySQL, MS SQL Server, JavaScript, AWS, Linux, Shell Scripting, AJAX, mongodb.

Confidential - NJ

Python Developer

Responsibilities:

Worked with team of developers on Python applications for RISK management.
Designed the database schema for the content management system.
Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS and JavaScript.
Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
Wrote Python routines to log into the websites and fetch data for selected options.
Performed testing using Django's Test Module.
Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as Pig and Spark.
Built the entire Hadoop platform from scratch.
Experience in ingesting real time/near real time data using Flume, Kafka, Storm
Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
Estimated the Software & Hardware requirements for the Name Node and Data Node in the cluster.
Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase using MapReduce
Written the Map Reduce programs, Hive UDFs in Java.
Develop HIVE queries for the analysts.
Created an e-mail notification service upon completion of job for the team which requested for the data.
Defined job work flows as per their dependencies in Oozie.
Closely observed building the Reporting Application, which uses the Spark SQL to fetch and generate reports on table data
Knowledge in performance troubleshooting and tuning Hadoop clusters in Cloudera
Worked on middle tier and persistence layer. Created service and model layer classes and Value objects/ POJO to hold values between java classes and database fields.
Exported/Imported data between different data sources using SQL Server Management Studio. Maintained program libraries, users' manuals and technical documentation.
Responsible for debugging and troubleshooting the web application.
Successfully migrated all the data to the database while the site was in production.
Implemented the validation, error handling, and caching framework with Oracle Coherence cache.
Worked on scripts for setting up the discovery client with attribute data. Worked on scripts(granite reference data scripts) for setting up adapter attributes in granite system.

Environment: Python 2.7,Hadoop, Django 1.4, HTML5, CSS, XML, MySQL, JavaScript, JQuery, Mongo DB, MS SQL Server, JavaScript, GitHub, AWS, Linux, Shell Scripting, AJAX .

Confidential

Software Engineer / Python

Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modelling, analysis, design and development.
Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
Responsible for entire data migration from Sybase ASE server to Oracle
Migration of API code written for Sybase to Oracle.
Overlook the migration activity of PL/SQL programs
Migration of the PL/SQL code from Sybase to Oracle.
Migration of the data contained in the earlier ASPL Database from Sybase to Oracle.
Migrate the Libraries written using Sybase API's to Oracle's OCCI API's
Automation of testing using Python.

Environment: Python 2.7, Shell scripting,PL/SQL, SVN, Quality Center, Solaris, Windows, perl .

We provide IT Staff Augmentation Services!

Data Engineer/architect Resume

Wilmington, DE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship